“Being a square keeps you from going around in circles.” (J. Vernon McGee.)
Tetragon allows you to predict a sequence of a time feature by expanding the matrix of distances among sequences. Below you can see a general depiction of Tetragon’s rationale: this implementation allows you to optmize a compact set of hyper-parameters (seq_len
, method
, distr
) through Random Search. The main phases of computation can be summarized as in the following:
Each time feature is differentiated, then divided into sequences of length equals to seq_len
.
For each differentiated and segmented time feature, a matrix of distances is calculated within each expanding n_window
testing choosing among different symmetric measures (method
allows for euclidean”, “manhattan”, “chebyshev”, “gower”, “lorentzian”, “jaccard”, “dice”, “squared_euclidean”, “divergence”, “clark”, “avg”).
Each distance matrix is normalized by minmax, then the distance matrix is expanded: for matrix expansion, we mean the operation depicted below: it is an attempt at guessing the distance weights for the last column (the next sequence), given that in a symmetric matrix the last value of the column is zero by definition.
For each row, each matrix is expanded using a truncated distribution (distr
allows for “norm”, “cauchy”, “logis”, “t”, “exp”) and a thousand of different potential columns are generated.
For each generation, the new sequence is estimated as weighted average of the previous sequences, using the reciprocal of last column values as weights.
Integration of predicted values is performed for calculation of testing errors and confidence intervals for each point in the sequence (for each time feature).
The process flow of Tetragon
In our introduction to Tetragon, we are going to use an old data frame with daily and cumulative cases of Covid infections and deaths in Europe since March 2021 to August 2021: a small data set including four different time features ordered in columns in a data frame format.
date | daily_cases | daily_deaths | cumulative_cases | cumulative_deaths |
---|---|---|---|---|
2021-03-02 | 102125 | 1973 | 22724397 | 549967 |
2021-03-03 | 117049 | 2755 | 22841446 | 552722 |
2021-03-04 | 133743 | 2461 | 22975189 | 555183 |
2021-03-05 | 133057 | 2724 | 23108246 | 557907 |
2021-03-06 | 126833 | 2111 | 23235079 | 560018 |
2021-03-07 | 103745 | 1527 | 23338824 | 561545 |
2021-03-08 | 88904 | 1535 | 23427728 | 563080 |
2021-03-09 | 107841 | 2157 | 23535569 | 565237 |
2021-03-10 | 129334 | 2580 | 23664903 | 567817 |
2021-03-11 | 148639 | 2416 | 23813542 | 570233 |
In the first example, we are predicting the next 10 days for two time features, reducing the standard expanding windows for cross-validation (n_windows
is set to 3 instead of 10, meaning that the model is tested 3 times according to an expanding scheme; when there is not enough data for the validation windows a message will be visualized).
<- tetragon(covid_in_europe[, c("daily_cases", "daily_deaths")], seq_len = 10, dates = covid_in_europe$date, method = "euclidean", distr = "exp", n_windows = 3, n_sample = 1)
example1 : 0.89 sec elapsed time
The result is a list of different components, as you can see below.
names(example1)
1] "exploration" "history" "best" "time_log" [
The first variableexploration
includes all the model generated during the random search. The second variable, history
, summarizes the hyper-parameters selected by the user or through random search and relative error metrics1. Besides the predictions for each feature, best
includes testing error statistics, prediction stats and plots for each one.
names(example1$best)
1] "predictions" "testing_errors" "plots" [
The prediction is a list including the predicted results for each time-feature (quantile, min, max, mean, mode, sd, skewness, kurtosis, etc. for each time point in the seq_len
sequence). Let’s see the prediction table for the first time feature.
::kable(example1$best$predictions[[1]], align = "ccc", caption = "Predictions for daily Covid cases in Europe") knitr
min | 10% | 25% | 50% | 75% | 90% | max | mean | sd | mode | kurtosis | skewness | iqr_to_range | risk_ratio | upside_prob | divergence | entropy | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2021-08-12 | 29494 | 36571.0 | 39707.0 | 42966 | 46422.0 | 50144.8 | 61868 | 43325.94 | 5868.115 | 44541.04 | 4.010 | 0.658 | 0.2074195 | 1.403058 | NA | NA | 6.898762 |
2021-08-13 | 15093 | 29890.0 | 36067.0 | 42581 | 50325.0 | 60147.0 | 72431 | 43443.61 | 11477.092 | 41271.57 | 2.972 | 0.395 | 0.2486658 | 1.085928 | 0.487 | 0.990 | 6.872923 |
2021-08-14 | 6414 | 28120.0 | 35329.0 | 43418 | 50384.5 | 56663.0 | 95753 | 43875.77 | 15169.204 | 44854.23 | 5.502 | 0.861 | 0.1685210 | 1.414307 | 0.533 | 0.952 | 6.847995 |
2021-08-15 | 8297 | 27150.4 | 40759.5 | 50045 | 57198.0 | 64589.7 | 105443 | 49712.23 | 17048.930 | 51671.99 | 4.938 | 0.621 | 0.1692144 | 1.326962 | 0.602 | 0.966 | 6.847656 |
2021-08-16 | 2452 | 22121.0 | 33011.0 | 39363 | 46123.0 | 52914.0 | 78231 | 39129.08 | 13017.453 | 39442.62 | 4.452 | 0.124 | 0.1730295 | 1.053019 | 0.331 | 0.953 | 6.847161 |
2021-08-17 | 19716 | 31420.0 | 38120.0 | 42400 | 48168.0 | 51748.0 | 69669 | 42744.43 | 9704.995 | 42425.11 | 3.820 | 0.319 | 0.2011491 | 1.202125 | 0.565 | 0.924 | 6.881801 |
2021-08-18 | 38129 | 47005.5 | 49754.0 | 54793 | 57998.0 | 60783.0 | 71848 | 54117.75 | 6349.018 | 55887.94 | 3.600 | -0.207 | 0.2444912 | 1.023464 | 0.846 | 0.990 | 6.900765 |
2021-08-19 | 23747 | 39750.0 | 46869.0 | 50084 | 53779.0 | 58015.0 | 71423 | 49850.44 | 7992.403 | 49818.05 | 4.745 | -0.329 | 0.1449367 | 0.810229 | 0.343 | 0.968 | 6.894386 |
2021-08-20 | 11707 | 35310.0 | 42920.0 | 49192 | 58626.0 | 64973.0 | 92135 | 50419.57 | 13782.792 | 47188.54 | 4.286 | 0.246 | 0.1952803 | 1.145605 | 0.510 | 0.952 | 6.869059 |
2021-08-21 | 27772 | 39543.0 | 44994.0 | 54057 | 61759.0 | 73170.0 | 127966 | 56581.26 | 18827.413 | 48847.84 | 7.773 | 1.992 | 0.1673254 | 2.811832 | 0.592 | 0.964 | 6.859645 |
In version 1.1, IQR to range, risk ratio, upside probability, divergence and entropy, have been added directly inside the prediction table and the terminal values (calculation at both ends of each sequence) have been dismissed. Just a couple of brief explanation here:
1. IQR to range: well, almost self explanatory, the normalization of IQR to min-max range allows for comparison among different time features;
2. risk ratio: here we mean the ratio between the range above median and the range below (in financial series allows you to understand how deep is the precipice even when the trend is going up);
3. upside probability: no brainer, probability of getting a larger value compared to the previous point, easy (an annotation here: in most cases the value is around 50%);
4. divergence: we dismissed the average Kullback-Leibler divergence for a simpler measure of divergence, quite similar to Chebyshev distance: in our humble case, the max distance between subsequent ecdf;
5. entropy: a last new entry from a specific package2 (it could be of interest to understand how entropy evolves in long-term forecasting and how entropy is related to good or bad predictions).
For each time features included in the model, you get a plot of the median with the chosen confidence interval (ci
default is 0.8).
$best$plots
example1$daily_cases
$daily_deaths
Now, the question is simple: can we get a better prediction searching the hyper-space with Random Search? Let’s give it a try. The following example show you how to sample 100 different models from a compact hyper-parameter space: we are searching for the best methods
and distr
(you can set the parameters of your choosing among the available options and search for seq_len
too).
<- tetragon(covid_in_europe[, c("daily_cases", "daily_deaths")], seq_len = 10, n_sample = 100, n_windows = 3)
example2 : 102.76 sec elapsed time
If we compare the error statistics from the best model in example2
with the naive model in example1
, we see a clear improvement.
The error statistics from example1
.
::kable(round(example1$best$testing_errors, 2), align = "ccc", caption = "Testing errors for each time feature BEFORE random search") knitr
pred_scores | me | mae | mse | rmsse | mpe | mape | rmae | rrmse | rame | mase | smse | sce | gmrae | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
daily_cases | 0.22 | -2566.83 | 14079.20 | 457535707.3 | 140.37 | -0.14 | 0.31 | 1.04 | 1.03 | 2.28 | 0.87 | 27268.05 | -1.73 | 1.01 |
daily_deaths | 0.22 | -68.57 | 354.11 | 248933.8 | 23.03 | 0.05 | 0.31 | 0.90 | 0.93 | 1.05 | 1.00 | 687.76 | -1.55 | 0.68 |
The error statistics from example2
.
::kable(round(example2$best$testing_errors, 2), align = "ccc", caption = "Testing errors for each time feature AFTER random search") knitr
pred_scores | me | mae | mse | rmsse | mpe | mape | rmae | rrmse | rame | mase | smse | sce | gmrae | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
daily_cases | 0.51 | -5429.85 | 13112.13 | 522919285 | 139.83 | -0.11 | 0.27 | 0.93 | 0.98 | 0.42 | 0.81 | 31122.21 | -3.20 | 0.80 |
daily_deaths | 0.25 | -83.38 | 315.56 | 215040 | 20.89 | 0.00 | 0.27 | 0.79 | 0.82 | 0.86 | 0.89 | 591.87 | -2.07 | 0.69 |
A closer look to the history table:
::kable(example2$history, align = "ccc", caption = "Search history (100 samples)") knitr
seq_len | method | distr | avg_pred_scores | avg_me | avg_mae | avg_mse | avg_rmsse | avg_mpe | avg_mape | avg_rmae | avg_rrmse | avg_rame | avg_mase | avg_smse | avg_sce | avg_gmrae | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10 | 10 | divergence, divergence | empirical, t | 0.37985 | -2756.6148 | 6713.846 | 261567162 | 80.35933 | -0.0580000 | 0.2700000 | 0.8593333 | 0.9008333 | 0.640500 | 0.8491667 | 15857.039 | -2.6373333 | 0.7456667 |
91 | 10 | euclidean , divergence | t , empirical | 0.30375 | -1598.3510 | 6555.080 | 201442525 | 77.14700 | -0.0571667 | 0.2953333 | 0.9133333 | 0.9510000 | 1.538833 | 0.8795000 | 12340.490 | -1.4605000 | 0.7755000 |
32 | 10 | clark, clark | cauchy, exp | 0.29745 | -5615.8407 | 9312.216 | 599246599 | 109.04050 | -0.0790000 | 0.3488333 | 1.0473333 | 1.0961667 | 2.027167 | 1.0588333 | 35819.043 | -3.1976667 | 0.8678333 |
48 | 10 | clark , manhattan | cauchy, exp | 0.29340 | -5486.0452 | 8916.275 | 569185018 | 104.32350 | -0.0920000 | 0.2986667 | 0.9360000 | 0.9878333 | 1.507500 | 0.9910000 | 33990.298 | -4.0850000 | 0.7561667 |
76 | 10 | divergence, avg | empirical, norm | 0.29205 | -2054.6215 | 6688.655 | 251093428 | 81.60133 | 0.0055000 | 0.2860000 | 0.9175000 | 0.9783333 | 1.087333 | 0.8833333 | 15301.284 | -1.7448333 | 0.7508333 |
85 | 10 | jaccard, clark | exp , empirical | 0.29050 | -844.7142 | 6508.213 | 185421568 | 74.59367 | -0.0393333 | 0.2791667 | 0.8871667 | 0.9115000 | 1.231833 | 0.8655000 | 11378.589 | -1.4760000 | 0.7943333 |
58 | 10 | dice , clark | logis, exp | 0.28920 | -481.1907 | 6074.388 | 153100908 | 70.42583 | 0.0013333 | 0.2995000 | 0.9110000 | 0.9445000 | 1.133500 | 0.8556667 | 9469.215 | -0.3476667 | 0.8278333 |
19 | 10 | gower, gower | exp, t | 0.28785 | -1407.7273 | 6465.874 | 204133246 | 75.65033 | -0.0405000 | 0.2733333 | 0.8646667 | 0.9055000 | 1.229500 | 0.8446667 | 12458.500 | -1.5706667 | 0.6878333 |
4 | 10 | manhattan, clark | exp, exp | 0.28715 | -895.8275 | 6785.242 | 202380568 | 77.97783 | 0.0056667 | 0.3130000 | 0.9426667 | 0.9880000 | 1.316333 | 0.8823333 | 12358.717 | -0.2581667 | 0.7996667 |
51 | 10 | jaccard , euclidean | exp , cauchy | 0.27750 | -1074.2392 | 6151.684 | 168395538 | 72.07883 | -0.0348333 | 0.2813333 | 0.8803333 | 0.9085000 | 1.308667 | 0.8486667 | 10360.166 | -1.3445000 | 0.8106667 |
21 | 10 | lorentzian, gower | t , exp | 0.27595 | -869.9140 | 6355.827 | 177296773 | 73.57150 | -0.0276667 | 0.2801667 | 0.8820000 | 0.9151667 | 1.315667 | 0.8505000 | 10879.900 | -1.2433333 | 0.7696667 |
50 | 10 | jaccard , chebyshev | exp , logis | 0.26985 | -980.8005 | 6011.796 | 152465063 | 70.28667 | -0.0573333 | 0.2816667 | 0.8748333 | 0.9068333 | 1.607500 | 0.8403333 | 9451.682 | -1.6213333 | 0.7505000 |
52 | 10 | divergence, divergence | t , norm | 0.26790 | -1611.2940 | 6943.843 | 235279575 | 82.94167 | 0.0178333 | 0.3678333 | 1.0525000 | 1.1001667 | 1.421667 | 0.9920000 | 14378.800 | 0.2153333 | 0.9493333 |
78 | 10 | avg , clark | logis, logis | 0.26740 | -1101.4172 | 6611.396 | 195036859 | 76.89033 | -0.0163333 | 0.3193333 | 0.9511667 | 0.9875000 | 1.440167 | 0.8953333 | 11945.786 | -0.4395000 | 0.8666667 |
80 | 10 | lorentzian, clark | cauchy, t | 0.26470 | -1188.6183 | 6541.933 | 198199937 | 75.86633 | -0.0483333 | 0.2776667 | 0.8670000 | 0.9041667 | 1.164167 | 0.8411667 | 12106.506 | -1.6791667 | 0.7401667 |
62 | 10 | divergence, divergence | t , exp | 0.26020 | -1701.8977 | 7247.789 | 239660193 | 84.21867 | -0.0130000 | 0.3735000 | 1.0575000 | 1.0985000 | 1.827167 | 0.9728333 | 14644.128 | -0.2953333 | 0.8871667 |
64 | 10 | jaccard , squared_euclidean | logis, exp | 0.25940 | -878.8358 | 6387.124 | 178724136 | 73.96700 | -0.0425000 | 0.2761667 | 0.8785000 | 0.9116667 | 1.297833 | 0.8540000 | 10989.925 | -1.5446667 | 0.7793333 |
7 | 10 | divergence, lorentzian | empirical, norm | 0.25895 | -948.8515 | 7524.854 | 271185548 | 86.75583 | 0.0030000 | 0.2896667 | 0.9555000 | 1.0055000 | 1.160500 | 0.9150000 | 16502.917 | -1.4636667 | 0.6811667 |
8 | 10 | gower, clark | logis , empirical | 0.25660 | -1083.1680 | 6300.028 | 174554642 | 73.26717 | -0.0498333 | 0.2880000 | 0.8865000 | 0.9266667 | 1.494833 | 0.8505000 | 10740.963 | -1.3463333 | 0.7136667 |
69 | 10 | jaccard, jaccard | exp, t | 0.25300 | -731.6817 | 7107.943 | 219951581 | 79.91767 | -0.0211667 | 0.2881667 | 0.9166667 | 0.9466667 | 1.115667 | 0.8866667 | 13401.045 | -1.3150000 | 0.7690000 |
22 | 10 | manhattan , squared_euclidean | exp , logis | 0.24685 | -574.6447 | 6184.675 | 151779049 | 70.17333 | -0.0380000 | 0.2788333 | 0.8695000 | 0.8948333 | 1.569500 | 0.8161667 | 9358.199 | -0.9401667 | 0.7651667 |
20 | 10 | jaccard , manhattan | t, t | 0.24685 | -1134.8733 | 6473.737 | 182647693 | 74.60767 | -0.0531667 | 0.2878333 | 0.8956667 | 0.9250000 | 1.554333 | 0.8560000 | 11210.897 | -1.5323333 | 0.7021667 |
3 | 10 | clark, gower | t , logis | 0.24670 | -1668.1400 | 6500.985 | 210819726 | 76.72100 | -0.0385000 | 0.2831667 | 0.8803333 | 0.9176667 | 1.413333 | 0.8533333 | 12838.904 | -1.5671667 | 0.7536667 |
86 | 10 | avg , gower | cauchy, exp | 0.24495 | -1313.7755 | 6582.964 | 191716205 | 75.78000 | -0.0615000 | 0.2876667 | 0.8900000 | 0.9253333 | 1.663333 | 0.8483333 | 11743.182 | -1.6861667 | 0.7060000 |
45 | 10 | squared_euclidean, lorentzian | logis, t | 0.24485 | -678.3088 | 6134.132 | 156846427 | 71.16417 | -0.0445000 | 0.2850000 | 0.8900000 | 0.9225000 | 1.378000 | 0.8590000 | 9732.013 | -1.5423333 | 0.7261667 |
54 | 10 | divergence, clark | exp , norm | 0.24460 | -6826.5583 | 10560.672 | 737105771 | 120.11600 | -0.1206667 | 0.3896667 | 1.1878333 | 1.2086667 | 2.698500 | 1.1436667 | 44079.989 | -3.8466667 | 1.0766667 |
73 | 10 | divergence, euclidean | exp, t | 0.24150 | -6489.9573 | 10404.374 | 735204637 | 118.16450 | -0.1328333 | 0.3376667 | 1.0726667 | 1.1073333 | 2.400833 | 1.0995000 | 43896.297 | -4.8745000 | 0.8695000 |
13 | 10 | gower , manhattan | t , logis | 0.24105 | -1121.7063 | 6371.332 | 182720230 | 74.34567 | -0.0288333 | 0.2903333 | 0.9046667 | 0.9361667 | 1.306167 | 0.8715000 | 11222.789 | -1.2818333 | 0.7831667 |
60 | 10 | clark , squared_euclidean | logis, t | 0.24080 | -5160.4487 | 9327.230 | 590455365 | 108.13700 | -0.0948333 | 0.3136667 | 0.9860000 | 1.0308333 | 1.936167 | 1.0210000 | 35274.667 | -3.8776667 | 0.8060000 |
59 | 10 | avg , clark | logis, t | 0.23585 | -1013.9622 | 6257.372 | 175179673 | 72.84633 | -0.0406667 | 0.2773333 | 0.8733333 | 0.9083333 | 1.363333 | 0.8463333 | 10765.391 | -1.3565000 | 0.7176667 |
43 | 10 | manhattan , lorentzian | cauchy, t | 0.23535 | -1173.8897 | 6174.214 | 165110203 | 71.41467 | -0.0533333 | 0.2850000 | 0.8778333 | 0.9061667 | 1.642000 | 0.8293333 | 10161.166 | -1.5353333 | 0.7778333 |
55 | 10 | squared_euclidean, dice | logis, exp | 0.23455 | -705.7505 | 6517.286 | 176386482 | 74.15900 | -0.0528333 | 0.2853333 | 0.8963333 | 0.9223333 | 1.451833 | 0.8608333 | 10857.134 | -1.4308333 | 0.7746667 |
79 | 10 | divergence, jaccard | t , cauchy | 0.23315 | -1703.3458 | 6676.441 | 219777949 | 77.64833 | -0.0425000 | 0.2868333 | 0.8836667 | 0.9203333 | 1.384167 | 0.8620000 | 13369.606 | -1.8043333 | 0.7450000 |
26 | 10 | clark , manhattan | t, t | 0.23220 | -1088.9512 | 7020.025 | 215157771 | 79.79267 | -0.0576667 | 0.2918333 | 0.9200000 | 0.9495000 | 1.528833 | 0.8951667 | 13155.277 | -1.7805000 | 0.7691667 |
88 | 10 | avg, avg | exp , norm | 0.23200 | -1260.1432 | 6692.617 | 199888002 | 76.99783 | -0.0481667 | 0.2916667 | 0.9155000 | 0.9420000 | 1.591500 | 0.8916667 | 12257.487 | -1.5770000 | 0.7796667 |
81 | 10 | euclidean, euclidean | t, t | 0.23170 | -642.9992 | 6679.146 | 186953417 | 76.51717 | -0.0280000 | 0.2946667 | 0.9203333 | 0.9550000 | 1.425333 | 0.8856667 | 11480.644 | -1.2540000 | 0.7725000 |
18 | 10 | clark , squared_euclidean | norm , cauchy | 0.23035 | -5766.2057 | 9291.684 | 607199016 | 108.55433 | -0.0988333 | 0.3175000 | 0.9960000 | 1.0396667 | 1.988833 | 1.0321667 | 36271.596 | -4.0200000 | 0.8025000 |
82 | 10 | lorentzian, chebyshev | exp , norm | 0.23020 | -760.8563 | 6490.586 | 184039229 | 74.91850 | -0.0200000 | 0.2835000 | 0.8930000 | 0.9260000 | 1.276000 | 0.8591667 | 11273.216 | -1.1155000 | 0.7698333 |
100 | 10 | clark, dice | cauchy, cauchy | 0.22740 | -5502.2032 | 9245.353 | 584568211 | 107.22067 | -0.1223333 | 0.3163333 | 0.9911667 | 1.0243333 | 2.073500 | 1.0213333 | 34949.647 | -4.3208333 | 0.8360000 |
31 | 10 | dice , jaccard | norm , cauchy | 0.22710 | -729.1557 | 6190.017 | 163760792 | 72.25433 | -0.0343333 | 0.2835000 | 0.8911667 | 0.9285000 | 1.301667 | 0.8613333 | 10124.179 | -1.3891667 | 0.7101667 |
42 | 10 | squared_euclidean, euclidean | exp , logis | 0.22630 | -239.3847 | 6627.497 | 176047870 | 74.26167 | -0.0100000 | 0.2826667 | 0.9061667 | 0.9246667 | 1.193500 | 0.8718333 | 10813.507 | -0.7950000 | 0.8181667 |
83 | 10 | clark , euclidean | t , logis | 0.22505 | -1427.1773 | 6862.577 | 211467060 | 79.34267 | -0.0550000 | 0.3001667 | 0.9373333 | 0.9671667 | 1.715167 | 0.9010000 | 12946.126 | -1.6805000 | 0.7718333 |
99 | 10 | jaccard, dice | norm, exp | 0.22320 | -1038.6300 | 6717.920 | 204287115 | 76.92017 | -0.0465000 | 0.2695000 | 0.8696667 | 0.9026667 | 1.244500 | 0.8493333 | 12464.682 | -1.6465000 | 0.7506667 |
46 | 10 | gower , divergence | cauchy, t | 0.22235 | -1022.5437 | 6601.788 | 182762757 | 75.21933 | -0.0603333 | 0.2963333 | 0.9115000 | 0.9418333 | 1.643333 | 0.8646667 | 11232.244 | -1.5605000 | 0.7638333 |
33 | 10 | squared_euclidean, lorentzian | logis, exp | 0.22225 | -554.6870 | 6542.160 | 175352838 | 74.13350 | -0.0290000 | 0.2990000 | 0.9270000 | 0.9475000 | 1.446833 | 0.8930000 | 10801.494 | -1.1955000 | 0.7938333 |
30 | 10 | lorentzian, manhattan | t , cauchy | 0.22080 | -1419.8172 | 6859.918 | 225019319 | 79.40233 | -0.0446667 | 0.2800000 | 0.8905000 | 0.9283333 | 1.374167 | 0.8761667 | 13703.811 | -1.6801667 | 0.6518333 |
74 | 10 | manhattan , squared_euclidean | t , norm | 0.22075 | -1209.4188 | 6695.710 | 203547576 | 77.65383 | -0.0241667 | 0.2951667 | 0.9196667 | 0.9535000 | 1.521167 | 0.8873333 | 12441.809 | -1.1903333 | 0.7651667 |
68 | 10 | manhattan, jaccard | t , norm | 0.21980 | -1284.3032 | 6419.215 | 182770253 | 74.64483 | -0.0475000 | 0.2956667 | 0.9111667 | 0.9433333 | 1.635500 | 0.8723333 | 11235.239 | -1.5423333 | 0.7423333 |
38 | 10 | avg , chebyshev | logis, exp | 0.21955 | -1610.3110 | 6520.614 | 198390251 | 76.46667 | -0.0636667 | 0.2956667 | 0.9093333 | 0.9486667 | 1.640833 | 0.8786667 | 12173.210 | -2.0385000 | 0.7383333 |
2 | 10 | euclidean , divergence | norm , cauchy | 0.21380 | -1398.2370 | 6389.834 | 181781374 | 76.59017 | 0.0011667 | 0.3733333 | 1.0440000 | 1.0931667 | 1.659667 | 0.9583333 | 11248.290 | 0.2648333 | 0.8990000 |
36 | 10 | clark , divergence | cauchy, norm | 0.21290 | -5578.4527 | 9482.778 | 590306483 | 112.17367 | -0.0708333 | 0.4125000 | 1.1766667 | 1.2335000 | 2.307167 | 1.1328333 | 35403.728 | -2.4796667 | 0.9716667 |
75 | 10 | squared_euclidean, manhattan | t , cauchy | 0.21240 | -466.9628 | 6508.598 | 170450039 | 73.63067 | -0.0228333 | 0.2871667 | 0.8963333 | 0.9276667 | 1.533667 | 0.8448333 | 10456.566 | -0.7486667 | 0.7360000 |
14 | 10 | squared_euclidean, chebyshev | logis , cauchy | 0.21130 | -434.1252 | 6661.615 | 184405947 | 74.89550 | -0.0133333 | 0.2848333 | 0.9026667 | 0.9255000 | 1.190667 | 0.8678333 | 11302.026 | -0.8918333 | 0.7973333 |
6 | 10 | dice , gower | exp , norm | 0.20705 | -602.2478 | 6726.564 | 186994100 | 75.82367 | -0.0231667 | 0.2883333 | 0.9063333 | 0.9303333 | 1.358833 | 0.8648333 | 11447.051 | -1.0038333 | 0.8041667 |
29 | 10 | chebyshev, jaccard | t , norm | 0.20505 | -1238.3750 | 6639.366 | 194371012 | 76.70633 | -0.0505000 | 0.2995000 | 0.9205000 | 0.9486667 | 1.638500 | 0.8800000 | 11914.258 | -1.5926667 | 0.7581667 |
72 | 10 | dice , clark | empirical, logis | 0.20265 | -866.0878 | 6354.424 | 177958679 | 74.42783 | 0.0046667 | 0.3135000 | 0.9360000 | 0.9818333 | 1.279833 | 0.8778333 | 10934.783 | -0.2145000 | 0.8330000 |
9 | 10 | manhattan, chebyshev | cauchy, norm | 0.20215 | -1430.9840 | 6398.092 | 188161176 | 74.56583 | -0.0585000 | 0.2861667 | 0.8865000 | 0.9236667 | 1.587667 | 0.8613333 | 11559.914 | -1.8636667 | 0.7006667 |
90 | 10 | gower, avg | cauchy, norm | 0.19975 | -1121.0180 | 6418.189 | 179478430 | 74.24067 | -0.0505000 | 0.2861667 | 0.8916667 | 0.9260000 | 1.587500 | 0.8533333 | 11031.397 | -1.4951667 | 0.6866667 |
27 | 10 | jaccard, jaccard | norm , cauchy | 0.19760 | -873.6373 | 6512.593 | 184454874 | 74.38200 | -0.0518333 | 0.2738333 | 0.8641667 | 0.8970000 | 1.351000 | 0.8385000 | 11295.676 | -1.5745000 | 0.7505000 |
67 | 10 | euclidean, manhattan | cauchy, exp | 0.19685 | -948.5235 | 6814.524 | 194915552 | 77.31017 | -0.0568333 | 0.2888333 | 0.9166667 | 0.9401667 | 1.605167 | 0.8795000 | 11950.976 | -1.5485000 | 0.7848333 |
98 | 10 | lorentzian , squared_euclidean | norm , cauchy | 0.19325 | -1042.4753 | 6503.183 | 187346653 | 75.86083 | -0.0283333 | 0.2956667 | 0.9223333 | 0.9533333 | 1.477667 | 0.8913333 | 11519.460 | -1.2820000 | 0.7851667 |
70 | 10 | jaccard, gower | norm , cauchy | 0.19040 | -336.8587 | 6550.530 | 167404421 | 73.99267 | -0.0368333 | 0.2868333 | 0.9123333 | 0.9356667 | 1.441000 | 0.8783333 | 10346.005 | -1.2393333 | 0.7903333 |
15 | 10 | clark , euclidean | logis, t | 0.18990 | -5317.1587 | 10091.729 | 631653619 | 114.14983 | -0.1248333 | 0.3376667 | 1.0658333 | 1.0951667 | 2.376167 | 1.0636667 | 37757.621 | -4.1290000 | 0.8325000 |
40 | 10 | lorentzian, jaccard | norm, t | 0.18925 | -1116.1363 | 6428.041 | 178038459 | 74.48600 | -0.0543333 | 0.2933333 | 0.9013333 | 0.9380000 | 1.610167 | 0.8615000 | 10960.690 | -1.6995000 | 0.7551667 |
71 | 10 | manhattan, clark | empirical, cauchy | 0.18410 | -1046.9252 | 6305.790 | 177242293 | 73.87483 | -0.0170000 | 0.3051667 | 0.9151667 | 0.9571667 | 1.423000 | 0.8620000 | 10880.669 | -0.5423333 | 0.7826667 |
65 | 10 | avg , lorentzian | cauchy, norm | 0.18165 | -860.6475 | 6273.967 | 174503940 | 72.84700 | -0.0346667 | 0.2776667 | 0.8735000 | 0.9088333 | 1.244333 | 0.8441667 | 10721.652 | -1.3641667 | 0.6918333 |
5 | 10 | euclidean , squared_euclidean | logis, norm | 0.18110 | -879.9973 | 6428.592 | 171236022 | 73.47517 | -0.0370000 | 0.2951667 | 0.9133333 | 0.9341667 | 1.628333 | 0.8688333 | 10547.469 | -1.1835000 | 0.8273333 |
34 | 10 | euclidean, euclidean | t , empirical | 0.18110 | -1270.2000 | 6294.010 | 173398580 | 73.57017 | -0.0586667 | 0.2863333 | 0.8891667 | 0.9271667 | 1.607000 | 0.8513333 | 10679.290 | -1.7026667 | 0.6880000 |
97 | 10 | euclidean, chebyshev | exp , empirical | 0.18055 | -1370.9735 | 6146.852 | 178151364 | 72.51450 | -0.0505000 | 0.2770000 | 0.8600000 | 0.9036667 | 1.379333 | 0.8371667 | 10944.083 | -1.7485000 | 0.6326667 |
95 | 10 | manhattan , squared_euclidean | norm, norm | 0.17980 | -1418.2690 | 6725.072 | 198313386 | 76.73850 | -0.0490000 | 0.3006667 | 0.9303333 | 0.9471667 | 1.782500 | 0.8773333 | 12142.191 | -1.3891667 | 0.8326667 |
63 | 10 | chebyshev, euclidean | cauchy, norm | 0.17970 | -1629.5315 | 6426.457 | 191037529 | 75.41800 | -0.0608333 | 0.2950000 | 0.9081667 | 0.9410000 | 1.679167 | 0.8746667 | 11738.242 | -1.8870000 | 0.7041667 |
96 | 10 | chebyshev , lorentzian | t , norm | 0.17625 | -1261.2040 | 6807.739 | 206507881 | 77.68983 | -0.0553333 | 0.2870000 | 0.8978333 | 0.9310000 | 1.576500 | 0.8671667 | 12612.268 | -1.6558333 | 0.7565000 |
89 | 10 | gower, gower | empirical, t | 0.17400 | -1178.8008 | 6384.291 | 182716077 | 74.14950 | -0.0636667 | 0.2763333 | 0.8640000 | 0.9015000 | 1.487333 | 0.8383333 | 11214.671 | -1.8883333 | 0.6555000 |
25 | 10 | euclidean , lorentzian | empirical, exp | 0.17200 | -1057.1113 | 6463.954 | 184531090 | 74.55417 | -0.0573333 | 0.2790000 | 0.8748333 | 0.9073333 | 1.458333 | 0.8448333 | 11306.469 | -1.6208333 | 0.7008333 |
23 | 10 | manhattan, chebyshev | norm , cauchy | 0.17200 | -1370.5175 | 6580.256 | 185350254 | 75.58283 | -0.0733333 | 0.2943333 | 0.9210000 | 0.9433333 | 1.793333 | 0.8823333 | 11432.055 | -2.1041667 | 0.8101667 |
44 | 10 | squared_euclidean, dice | norm, norm | 0.17175 | -995.9355 | 6615.075 | 183190100 | 75.19283 | -0.0500000 | 0.3001667 | 0.9230000 | 0.9495000 | 1.622833 | 0.8786667 | 11279.641 | -1.5770000 | 0.7611667 |
39 | 10 | lorentzian , squared_euclidean | logis , empirical | 0.16890 | -1295.5753 | 6772.305 | 211441082 | 78.57483 | -0.0376667 | 0.2928333 | 0.9161667 | 0.9508333 | 1.486833 | 0.8876667 | 12914.098 | -1.4615000 | 0.7728333 |
66 | 10 | euclidean, jaccard | empirical, t | 0.16885 | -1098.8565 | 6515.307 | 187266082 | 75.28283 | -0.0626667 | 0.2800000 | 0.8796667 | 0.9161667 | 1.496500 | 0.8530000 | 11491.005 | -1.7535000 | 0.6946667 |
56 | 10 | jaccard , euclidean | logis , empirical | 0.16720 | -1307.4000 | 6279.768 | 184012233 | 73.89150 | -0.0480000 | 0.2763333 | 0.8666667 | 0.9081667 | 1.409000 | 0.8431667 | 11282.767 | -1.6855000 | 0.6720000 |
16 | 10 | clark , squared_euclidean | norm , empirical | 0.16560 | -5430.7632 | 9315.151 | 609923400 | 108.74117 | -0.0883333 | 0.3150000 | 0.9883333 | 1.0363333 | 1.745000 | 1.0368333 | 36424.723 | -3.9751667 | 0.7921667 |
47 | 10 | gower, dice | cauchy , empirical | 0.16385 | -1393.3568 | 6182.032 | 181112842 | 72.87733 | -0.0506667 | 0.2815000 | 0.8680000 | 0.9088333 | 1.450000 | 0.8438333 | 11118.197 | -1.6936667 | 0.6610000 |
92 | 10 | euclidean, euclidean | norm , cauchy | 0.16245 | -1259.2183 | 6476.160 | 186403439 | 75.18700 | -0.0528333 | 0.2913333 | 0.9035000 | 0.9360000 | 1.647833 | 0.8635000 | 11445.440 | -1.5420000 | 0.7458333 |
12 | 10 | lorentzian, lorentzian | exp , empirical | 0.16095 | -880.2260 | 6686.079 | 199903744 | 77.05400 | -0.0300000 | 0.2786667 | 0.8903333 | 0.9251667 | 1.118167 | 0.8663333 | 12219.831 | -1.4766667 | 0.7308333 |
35 | 10 | dice , gower | norm , empirical | 0.15845 | -1066.3492 | 5911.657 | 163935110 | 70.34217 | -0.0326667 | 0.2686667 | 0.8475000 | 0.8895000 | 1.133667 | 0.8235000 | 10101.107 | -1.4908333 | 0.6725000 |
77 | 10 | manhattan, manhattan | empirical, t | 0.15725 | -1116.9092 | 6350.289 | 179840078 | 73.84683 | -0.0506667 | 0.2820000 | 0.8776667 | 0.9096667 | 1.485500 | 0.8501667 | 11045.919 | -1.6420000 | 0.7210000 |
41 | 10 | manhattan, manhattan | t , empirical | 0.15550 | -1255.4373 | 6728.119 | 198467090 | 77.16867 | -0.0536667 | 0.2946667 | 0.9168333 | 0.9470000 | 1.651667 | 0.8795000 | 12161.848 | -1.6371667 | 0.7295000 |
24 | 10 | squared_euclidean, dice | empirical, logis | 0.15530 | -1092.8768 | 6607.361 | 185194280 | 75.89983 | -0.0621667 | 0.2953333 | 0.9175000 | 0.9493333 | 1.656167 | 0.8825000 | 11412.088 | -1.8668333 | 0.7580000 |
37 | 10 | gower , divergence | empirical, cauchy | 0.15320 | -1067.6582 | 6415.687 | 182262223 | 76.61483 | 0.0138333 | 0.3573333 | 1.0178333 | 1.0728333 | 1.517000 | 0.9430000 | 11250.134 | 0.3671667 | 0.8396667 |
28 | 10 | manhattan, gower | exp , empirical | 0.14885 | -504.0503 | 7027.039 | 199733130 | 78.03017 | -0.0285000 | 0.2928333 | 0.9258333 | 0.9461667 | 1.394833 | 0.8920000 | 12217.467 | -1.1651667 | 0.7860000 |
94 | 10 | euclidean, euclidean | empirical, t | 0.14595 | -1160.7033 | 6425.139 | 183416791 | 74.77150 | -0.0513333 | 0.2855000 | 0.8901667 | 0.9300000 | 1.536167 | 0.8583333 | 11266.298 | -1.6791667 | 0.7083333 |
53 | 10 | lorentzian, avg | cauchy , empirical | 0.14425 | -837.6475 | 6278.195 | 169777714 | 72.68633 | -0.0376667 | 0.2823333 | 0.8826667 | 0.9128333 | 1.351667 | 0.8471667 | 10443.588 | -1.3626667 | 0.7215000 |
93 | 10 | gower , euclidean | empirical, logis | 0.14245 | -1127.8158 | 6295.230 | 177634023 | 73.25050 | -0.0475000 | 0.2761667 | 0.8626667 | 0.9035000 | 1.457000 | 0.8291667 | 10895.509 | -1.4705000 | 0.6831667 |
1 | 10 | euclidean , lorentzian | norm, norm | 0.13720 | -1101.1345 | 6483.005 | 176958641 | 74.11217 | -0.0680000 | 0.2875000 | 0.8975000 | 0.9271667 | 1.686500 | 0.8510000 | 10895.183 | -1.7213333 | 0.6593333 |
87 | 10 | clark , euclidean | cauchy , empirical | 0.13675 | -5900.5617 | 9496.173 | 621630384 | 110.56267 | -0.1245000 | 0.3250000 | 1.0155000 | 1.0630000 | 2.200833 | 1.0416667 | 37159.605 | -4.4956667 | 0.7438333 |
84 | 10 | lorentzian, chebyshev | norm , empirical | 0.11860 | -1198.7222 | 6368.122 | 184717611 | 74.47500 | -0.0496667 | 0.2813333 | 0.8775000 | 0.9181667 | 1.458833 | 0.8511667 | 11330.715 | -1.6315000 | 0.6720000 |
17 | 10 | avg , jaccard | norm , empirical | 0.11130 | -1470.6315 | 6762.974 | 202334165 | 77.49817 | -0.0671667 | 0.2995000 | 0.9235000 | 0.9516667 | 1.769333 | 0.8791667 | 12399.887 | -1.8636667 | 0.7368333 |
49 | 10 | avg , lorentzian | logis , empirical | 0.10995 | -985.7722 | 6351.975 | 172105924 | 73.12800 | -0.0543333 | 0.2800000 | 0.8783333 | 0.9171667 | 1.552833 | 0.8455000 | 10598.459 | -1.6270000 | 0.6521667 |
57 | 10 | lorentzian, divergence | empirical, exp | 0.09620 | -1031.3753 | 6460.685 | 185656640 | 76.30867 | 0.0118333 | 0.3488333 | 0.9925000 | 1.0413333 | 1.399667 | 0.9198333 | 11420.129 | 0.1795000 | 0.8826667 |
61 | 10 | euclidean, avg | empirical, empirical | 0.07515 | -1096.2438 | 6469.849 | 184241686 | 75.01450 | -0.0473333 | 0.2870000 | 0.8953333 | 0.9318333 | 1.539667 | 0.8621667 | 11311.896 | -1.5465000 | 0.7011667 |
11 | 10 | avg , gower | empirical, empirical | 0.06125 | -1260.6433 | 6348.656 | 182447867 | 74.20667 | -0.0550000 | 0.2818333 | 0.8788333 | 0.9175000 | 1.516500 | 0.8491667 | 11200.435 | -1.6931667 | 0.6775000 |
Here are the best parameters discovered during the random search:
::kable(example2$history[1,]
knitralign = "ccc", caption = "Testing errors for each time feature after random search") ,
seq_len | method | distr | avg_pred_scores | avg_me | avg_mae | avg_mse | avg_rmsse | avg_mpe | avg_mape | avg_rmae | avg_rrmse | avg_rame | avg_mase | avg_smse | avg_sce | avg_gmrae | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10 | 10 | divergence, divergence | empirical, t | 0.37985 | -2756.615 | 6713.846 | 261567162 | 80.35933 | -0.058 | 0.27 | 0.8593333 | 0.9008333 | 0.6405 | 0.8491667 | 15857.04 | -2.637333 | 0.7456667 |
Let’s have a look to the plots for the best model.
$best$plots
example2$daily_cases
$daily_deaths
The error metrics are calculated using the greybox package. For any info, you can look here: https://cran.r-project.org/web/packages/greybox/index.html↩︎
We used the entropy package base options. For any information, you can look here: https://cran.r-project.org/web/packages/entropy/index.html↩︎