Tetragon: a brief introduction

Giancarlo Vercellino

25-April-2022

Being a square keeps you from going around in circles.” (J. Vernon McGee.)

Expanding the distance matrix to predict new sequences

Tetragon allows you to predict a sequence of a time feature by expanding the matrix of distances among sequences. Below you can see a general depiction of Tetragon’s rationale: this implementation allows you to optmize a compact set of hyper-parameters (seq_len, method, distr) through Random Search. The main phases of computation can be summarized as in the following:

  1. Each time feature is differentiated, then divided into sequences of length equals to seq_len.

  2. For each differentiated and segmented time feature, a matrix of distances is calculated within each expanding n_window testing choosing among different symmetric measures (method allows for euclidean”, “manhattan”, “chebyshev”, “gower”, “lorentzian”, “jaccard”, “dice”, “squared_euclidean”, “divergence”, “clark”, “avg”).

  3. Each distance matrix is normalized by minmax, then the distance matrix is expanded: for matrix expansion, we mean the operation depicted below: it is an attempt at guessing the distance weights for the last column (the next sequence), given that in a symmetric matrix the last value of the column is zero by definition.

  4. For each row, each matrix is expanded using a truncated distribution (distr allows for “norm”, “cauchy”, “logis”, “t”, “exp”) and a thousand of different potential columns are generated.

  5. For each generation, the new sequence is estimated as weighted average of the previous sequences, using the reciprocal of last column values as weights.

  6. Integration of predicted values is performed for calculation of testing errors and confidence intervals for each point in the sequence (for each time feature).

The process flow of Tetragon

The process flow of Tetragon

A simple example: Covid in Europe

In our introduction to Tetragon, we are going to use an old data frame with daily and cumulative cases of Covid infections and deaths in Europe since March 2021 to August 2021: a small data set including four different time features ordered in columns in a data frame format.

Examples of time features for Covid stats in Europe
date daily_cases daily_deaths cumulative_cases cumulative_deaths
2021-03-02 102125 1973 22724397 549967
2021-03-03 117049 2755 22841446 552722
2021-03-04 133743 2461 22975189 555183
2021-03-05 133057 2724 23108246 557907
2021-03-06 126833 2111 23235079 560018
2021-03-07 103745 1527 23338824 561545
2021-03-08 88904 1535 23427728 563080
2021-03-09 107841 2157 23535569 565237
2021-03-10 129334 2580 23664903 567817
2021-03-11 148639 2416 23813542 570233

In the first example, we are predicting the next 10 days for two time features, reducing the standard expanding windows for cross-validation (n_windows is set to 3 instead of 10, meaning that the model is tested 3 times according to an expanding scheme; when there is not enough data for the validation windows a message will be visualized).

example1 <- tetragon(covid_in_europe[, c("daily_cases", "daily_deaths")], seq_len = 10, dates = covid_in_europe$date, method = "euclidean", distr = "exp", n_windows = 3, n_sample = 1)
  time: 0.89 sec elapsed

The result is a list of different components, as you can see below.

names(example1)
  [1] "exploration" "history"     "best"        "time_log"

The first variableexploration includes all the model generated during the random search. The second variable, history, summarizes the hyper-parameters selected by the user or through random search and relative error metrics1. Besides the predictions for each feature, best includes testing error statistics, prediction stats and plots for each one.

names(example1$best)
  [1] "predictions"    "testing_errors" "plots"

The prediction is a list including the predicted results for each time-feature (quantile, min, max, mean, mode, sd, skewness, kurtosis, etc. for each time point in the seq_len sequence). Let’s see the prediction table for the first time feature.

knitr::kable(example1$best$predictions[[1]], align = "ccc", caption = "Predictions for daily Covid cases in Europe")
Predictions for daily Covid cases in Europe
min 10% 25% 50% 75% 90% max mean sd mode kurtosis skewness iqr_to_range risk_ratio upside_prob divergence entropy
2021-08-12 29494 36571.0 39707.0 42966 46422.0 50144.8 61868 43325.94 5868.115 44541.04 4.010 0.658 0.2074195 1.403058 NA NA 6.898762
2021-08-13 15093 29890.0 36067.0 42581 50325.0 60147.0 72431 43443.61 11477.092 41271.57 2.972 0.395 0.2486658 1.085928 0.487 0.990 6.872923
2021-08-14 6414 28120.0 35329.0 43418 50384.5 56663.0 95753 43875.77 15169.204 44854.23 5.502 0.861 0.1685210 1.414307 0.533 0.952 6.847995
2021-08-15 8297 27150.4 40759.5 50045 57198.0 64589.7 105443 49712.23 17048.930 51671.99 4.938 0.621 0.1692144 1.326962 0.602 0.966 6.847656
2021-08-16 2452 22121.0 33011.0 39363 46123.0 52914.0 78231 39129.08 13017.453 39442.62 4.452 0.124 0.1730295 1.053019 0.331 0.953 6.847161
2021-08-17 19716 31420.0 38120.0 42400 48168.0 51748.0 69669 42744.43 9704.995 42425.11 3.820 0.319 0.2011491 1.202125 0.565 0.924 6.881801
2021-08-18 38129 47005.5 49754.0 54793 57998.0 60783.0 71848 54117.75 6349.018 55887.94 3.600 -0.207 0.2444912 1.023464 0.846 0.990 6.900765
2021-08-19 23747 39750.0 46869.0 50084 53779.0 58015.0 71423 49850.44 7992.403 49818.05 4.745 -0.329 0.1449367 0.810229 0.343 0.968 6.894386
2021-08-20 11707 35310.0 42920.0 49192 58626.0 64973.0 92135 50419.57 13782.792 47188.54 4.286 0.246 0.1952803 1.145605 0.510 0.952 6.869059
2021-08-21 27772 39543.0 44994.0 54057 61759.0 73170.0 127966 56581.26 18827.413 48847.84 7.773 1.992 0.1673254 2.811832 0.592 0.964 6.859645

In version 1.1, IQR to range, risk ratio, upside probability, divergence and entropy, have been added directly inside the prediction table and the terminal values (calculation at both ends of each sequence) have been dismissed. Just a couple of brief explanation here:

1. IQR to range: well, almost self explanatory, the normalization of IQR to min-max range allows for comparison among different time features;

2. risk ratio: here we mean the ratio between the range above median and the range below (in financial series allows you to understand how deep is the precipice even when the trend is going up);

3. upside probability: no brainer, probability of getting a larger value compared to the previous point, easy (an annotation here: in most cases the value is around 50%);

4. divergence: we dismissed the average Kullback-Leibler divergence for a simpler measure of divergence, quite similar to Chebyshev distance: in our humble case, the max distance between subsequent ecdf;

5. entropy: a last new entry from a specific package2 (it could be of interest to understand how entropy evolves in long-term forecasting and how entropy is related to good or bad predictions).

For each time features included in the model, you get a plot of the median with the chosen confidence interval (ci default is 0.8).

example1$best$plots
  $daily_cases

  
  $daily_deaths

Automating the search for a better model

Now, the question is simple: can we get a better prediction searching the hyper-space with Random Search? Let’s give it a try. The following example show you how to sample 100 different models from a compact hyper-parameter space: we are searching for the best methods and distr (you can set the parameters of your choosing among the available options and search for seq_len too).

example2 <- tetragon(covid_in_europe[, c("daily_cases", "daily_deaths")], seq_len = 10, n_sample = 100, n_windows = 3)
  time: 102.76 sec elapsed

If we compare the error statistics from the best model in example2with the naive model in example1, we see a clear improvement.

The error statistics from example1.

knitr::kable(round(example1$best$testing_errors, 2), align = "ccc", caption = "Testing errors for each time feature BEFORE random search")
Testing errors for each time feature BEFORE random search
pred_scores me mae mse rmsse mpe mape rmae rrmse rame mase smse sce gmrae
daily_cases 0.22 -2566.83 14079.20 457535707.3 140.37 -0.14 0.31 1.04 1.03 2.28 0.87 27268.05 -1.73 1.01
daily_deaths 0.22 -68.57 354.11 248933.8 23.03 0.05 0.31 0.90 0.93 1.05 1.00 687.76 -1.55 0.68

The error statistics from example2.

knitr::kable(round(example2$best$testing_errors, 2), align = "ccc", caption = "Testing errors for each time feature AFTER random search")
Testing errors for each time feature AFTER random search
pred_scores me mae mse rmsse mpe mape rmae rrmse rame mase smse sce gmrae
daily_cases 0.51 -5429.85 13112.13 522919285 139.83 -0.11 0.27 0.93 0.98 0.42 0.81 31122.21 -3.20 0.80
daily_deaths 0.25 -83.38 315.56 215040 20.89 0.00 0.27 0.79 0.82 0.86 0.89 591.87 -2.07 0.69

A closer look to the history table:

knitr::kable(example2$history, align = "ccc", caption = "Search history (100 samples)")
Search history (100 samples)
seq_len method distr avg_pred_scores avg_me avg_mae avg_mse avg_rmsse avg_mpe avg_mape avg_rmae avg_rrmse avg_rame avg_mase avg_smse avg_sce avg_gmrae
10 10 divergence, divergence empirical, t 0.37985 -2756.6148 6713.846 261567162 80.35933 -0.0580000 0.2700000 0.8593333 0.9008333 0.640500 0.8491667 15857.039 -2.6373333 0.7456667
91 10 euclidean , divergence t , empirical 0.30375 -1598.3510 6555.080 201442525 77.14700 -0.0571667 0.2953333 0.9133333 0.9510000 1.538833 0.8795000 12340.490 -1.4605000 0.7755000
32 10 clark, clark cauchy, exp 0.29745 -5615.8407 9312.216 599246599 109.04050 -0.0790000 0.3488333 1.0473333 1.0961667 2.027167 1.0588333 35819.043 -3.1976667 0.8678333
48 10 clark , manhattan cauchy, exp 0.29340 -5486.0452 8916.275 569185018 104.32350 -0.0920000 0.2986667 0.9360000 0.9878333 1.507500 0.9910000 33990.298 -4.0850000 0.7561667
76 10 divergence, avg empirical, norm 0.29205 -2054.6215 6688.655 251093428 81.60133 0.0055000 0.2860000 0.9175000 0.9783333 1.087333 0.8833333 15301.284 -1.7448333 0.7508333
85 10 jaccard, clark exp , empirical 0.29050 -844.7142 6508.213 185421568 74.59367 -0.0393333 0.2791667 0.8871667 0.9115000 1.231833 0.8655000 11378.589 -1.4760000 0.7943333
58 10 dice , clark logis, exp 0.28920 -481.1907 6074.388 153100908 70.42583 0.0013333 0.2995000 0.9110000 0.9445000 1.133500 0.8556667 9469.215 -0.3476667 0.8278333
19 10 gower, gower exp, t 0.28785 -1407.7273 6465.874 204133246 75.65033 -0.0405000 0.2733333 0.8646667 0.9055000 1.229500 0.8446667 12458.500 -1.5706667 0.6878333
4 10 manhattan, clark exp, exp 0.28715 -895.8275 6785.242 202380568 77.97783 0.0056667 0.3130000 0.9426667 0.9880000 1.316333 0.8823333 12358.717 -0.2581667 0.7996667
51 10 jaccard , euclidean exp , cauchy 0.27750 -1074.2392 6151.684 168395538 72.07883 -0.0348333 0.2813333 0.8803333 0.9085000 1.308667 0.8486667 10360.166 -1.3445000 0.8106667
21 10 lorentzian, gower t , exp 0.27595 -869.9140 6355.827 177296773 73.57150 -0.0276667 0.2801667 0.8820000 0.9151667 1.315667 0.8505000 10879.900 -1.2433333 0.7696667
50 10 jaccard , chebyshev exp , logis 0.26985 -980.8005 6011.796 152465063 70.28667 -0.0573333 0.2816667 0.8748333 0.9068333 1.607500 0.8403333 9451.682 -1.6213333 0.7505000
52 10 divergence, divergence t , norm 0.26790 -1611.2940 6943.843 235279575 82.94167 0.0178333 0.3678333 1.0525000 1.1001667 1.421667 0.9920000 14378.800 0.2153333 0.9493333
78 10 avg , clark logis, logis 0.26740 -1101.4172 6611.396 195036859 76.89033 -0.0163333 0.3193333 0.9511667 0.9875000 1.440167 0.8953333 11945.786 -0.4395000 0.8666667
80 10 lorentzian, clark cauchy, t 0.26470 -1188.6183 6541.933 198199937 75.86633 -0.0483333 0.2776667 0.8670000 0.9041667 1.164167 0.8411667 12106.506 -1.6791667 0.7401667
62 10 divergence, divergence t , exp 0.26020 -1701.8977 7247.789 239660193 84.21867 -0.0130000 0.3735000 1.0575000 1.0985000 1.827167 0.9728333 14644.128 -0.2953333 0.8871667
64 10 jaccard , squared_euclidean logis, exp 0.25940 -878.8358 6387.124 178724136 73.96700 -0.0425000 0.2761667 0.8785000 0.9116667 1.297833 0.8540000 10989.925 -1.5446667 0.7793333
7 10 divergence, lorentzian empirical, norm 0.25895 -948.8515 7524.854 271185548 86.75583 0.0030000 0.2896667 0.9555000 1.0055000 1.160500 0.9150000 16502.917 -1.4636667 0.6811667
8 10 gower, clark logis , empirical 0.25660 -1083.1680 6300.028 174554642 73.26717 -0.0498333 0.2880000 0.8865000 0.9266667 1.494833 0.8505000 10740.963 -1.3463333 0.7136667
69 10 jaccard, jaccard exp, t 0.25300 -731.6817 7107.943 219951581 79.91767 -0.0211667 0.2881667 0.9166667 0.9466667 1.115667 0.8866667 13401.045 -1.3150000 0.7690000
22 10 manhattan , squared_euclidean exp , logis 0.24685 -574.6447 6184.675 151779049 70.17333 -0.0380000 0.2788333 0.8695000 0.8948333 1.569500 0.8161667 9358.199 -0.9401667 0.7651667
20 10 jaccard , manhattan t, t 0.24685 -1134.8733 6473.737 182647693 74.60767 -0.0531667 0.2878333 0.8956667 0.9250000 1.554333 0.8560000 11210.897 -1.5323333 0.7021667
3 10 clark, gower t , logis 0.24670 -1668.1400 6500.985 210819726 76.72100 -0.0385000 0.2831667 0.8803333 0.9176667 1.413333 0.8533333 12838.904 -1.5671667 0.7536667
86 10 avg , gower cauchy, exp 0.24495 -1313.7755 6582.964 191716205 75.78000 -0.0615000 0.2876667 0.8900000 0.9253333 1.663333 0.8483333 11743.182 -1.6861667 0.7060000
45 10 squared_euclidean, lorentzian logis, t 0.24485 -678.3088 6134.132 156846427 71.16417 -0.0445000 0.2850000 0.8900000 0.9225000 1.378000 0.8590000 9732.013 -1.5423333 0.7261667
54 10 divergence, clark exp , norm 0.24460 -6826.5583 10560.672 737105771 120.11600 -0.1206667 0.3896667 1.1878333 1.2086667 2.698500 1.1436667 44079.989 -3.8466667 1.0766667
73 10 divergence, euclidean exp, t 0.24150 -6489.9573 10404.374 735204637 118.16450 -0.1328333 0.3376667 1.0726667 1.1073333 2.400833 1.0995000 43896.297 -4.8745000 0.8695000
13 10 gower , manhattan t , logis 0.24105 -1121.7063 6371.332 182720230 74.34567 -0.0288333 0.2903333 0.9046667 0.9361667 1.306167 0.8715000 11222.789 -1.2818333 0.7831667
60 10 clark , squared_euclidean logis, t 0.24080 -5160.4487 9327.230 590455365 108.13700 -0.0948333 0.3136667 0.9860000 1.0308333 1.936167 1.0210000 35274.667 -3.8776667 0.8060000
59 10 avg , clark logis, t 0.23585 -1013.9622 6257.372 175179673 72.84633 -0.0406667 0.2773333 0.8733333 0.9083333 1.363333 0.8463333 10765.391 -1.3565000 0.7176667
43 10 manhattan , lorentzian cauchy, t 0.23535 -1173.8897 6174.214 165110203 71.41467 -0.0533333 0.2850000 0.8778333 0.9061667 1.642000 0.8293333 10161.166 -1.5353333 0.7778333
55 10 squared_euclidean, dice logis, exp 0.23455 -705.7505 6517.286 176386482 74.15900 -0.0528333 0.2853333 0.8963333 0.9223333 1.451833 0.8608333 10857.134 -1.4308333 0.7746667
79 10 divergence, jaccard t , cauchy 0.23315 -1703.3458 6676.441 219777949 77.64833 -0.0425000 0.2868333 0.8836667 0.9203333 1.384167 0.8620000 13369.606 -1.8043333 0.7450000
26 10 clark , manhattan t, t 0.23220 -1088.9512 7020.025 215157771 79.79267 -0.0576667 0.2918333 0.9200000 0.9495000 1.528833 0.8951667 13155.277 -1.7805000 0.7691667
88 10 avg, avg exp , norm 0.23200 -1260.1432 6692.617 199888002 76.99783 -0.0481667 0.2916667 0.9155000 0.9420000 1.591500 0.8916667 12257.487 -1.5770000 0.7796667
81 10 euclidean, euclidean t, t 0.23170 -642.9992 6679.146 186953417 76.51717 -0.0280000 0.2946667 0.9203333 0.9550000 1.425333 0.8856667 11480.644 -1.2540000 0.7725000
18 10 clark , squared_euclidean norm , cauchy 0.23035 -5766.2057 9291.684 607199016 108.55433 -0.0988333 0.3175000 0.9960000 1.0396667 1.988833 1.0321667 36271.596 -4.0200000 0.8025000
82 10 lorentzian, chebyshev exp , norm 0.23020 -760.8563 6490.586 184039229 74.91850 -0.0200000 0.2835000 0.8930000 0.9260000 1.276000 0.8591667 11273.216 -1.1155000 0.7698333
100 10 clark, dice cauchy, cauchy 0.22740 -5502.2032 9245.353 584568211 107.22067 -0.1223333 0.3163333 0.9911667 1.0243333 2.073500 1.0213333 34949.647 -4.3208333 0.8360000
31 10 dice , jaccard norm , cauchy 0.22710 -729.1557 6190.017 163760792 72.25433 -0.0343333 0.2835000 0.8911667 0.9285000 1.301667 0.8613333 10124.179 -1.3891667 0.7101667
42 10 squared_euclidean, euclidean exp , logis 0.22630 -239.3847 6627.497 176047870 74.26167 -0.0100000 0.2826667 0.9061667 0.9246667 1.193500 0.8718333 10813.507 -0.7950000 0.8181667
83 10 clark , euclidean t , logis 0.22505 -1427.1773 6862.577 211467060 79.34267 -0.0550000 0.3001667 0.9373333 0.9671667 1.715167 0.9010000 12946.126 -1.6805000 0.7718333
99 10 jaccard, dice norm, exp 0.22320 -1038.6300 6717.920 204287115 76.92017 -0.0465000 0.2695000 0.8696667 0.9026667 1.244500 0.8493333 12464.682 -1.6465000 0.7506667
46 10 gower , divergence cauchy, t 0.22235 -1022.5437 6601.788 182762757 75.21933 -0.0603333 0.2963333 0.9115000 0.9418333 1.643333 0.8646667 11232.244 -1.5605000 0.7638333
33 10 squared_euclidean, lorentzian logis, exp 0.22225 -554.6870 6542.160 175352838 74.13350 -0.0290000 0.2990000 0.9270000 0.9475000 1.446833 0.8930000 10801.494 -1.1955000 0.7938333
30 10 lorentzian, manhattan t , cauchy 0.22080 -1419.8172 6859.918 225019319 79.40233 -0.0446667 0.2800000 0.8905000 0.9283333 1.374167 0.8761667 13703.811 -1.6801667 0.6518333
74 10 manhattan , squared_euclidean t , norm 0.22075 -1209.4188 6695.710 203547576 77.65383 -0.0241667 0.2951667 0.9196667 0.9535000 1.521167 0.8873333 12441.809 -1.1903333 0.7651667
68 10 manhattan, jaccard t , norm 0.21980 -1284.3032 6419.215 182770253 74.64483 -0.0475000 0.2956667 0.9111667 0.9433333 1.635500 0.8723333 11235.239 -1.5423333 0.7423333
38 10 avg , chebyshev logis, exp 0.21955 -1610.3110 6520.614 198390251 76.46667 -0.0636667 0.2956667 0.9093333 0.9486667 1.640833 0.8786667 12173.210 -2.0385000 0.7383333
2 10 euclidean , divergence norm , cauchy 0.21380 -1398.2370 6389.834 181781374 76.59017 0.0011667 0.3733333 1.0440000 1.0931667 1.659667 0.9583333 11248.290 0.2648333 0.8990000
36 10 clark , divergence cauchy, norm 0.21290 -5578.4527 9482.778 590306483 112.17367 -0.0708333 0.4125000 1.1766667 1.2335000 2.307167 1.1328333 35403.728 -2.4796667 0.9716667
75 10 squared_euclidean, manhattan t , cauchy 0.21240 -466.9628 6508.598 170450039 73.63067 -0.0228333 0.2871667 0.8963333 0.9276667 1.533667 0.8448333 10456.566 -0.7486667 0.7360000
14 10 squared_euclidean, chebyshev logis , cauchy 0.21130 -434.1252 6661.615 184405947 74.89550 -0.0133333 0.2848333 0.9026667 0.9255000 1.190667 0.8678333 11302.026 -0.8918333 0.7973333
6 10 dice , gower exp , norm 0.20705 -602.2478 6726.564 186994100 75.82367 -0.0231667 0.2883333 0.9063333 0.9303333 1.358833 0.8648333 11447.051 -1.0038333 0.8041667
29 10 chebyshev, jaccard t , norm 0.20505 -1238.3750 6639.366 194371012 76.70633 -0.0505000 0.2995000 0.9205000 0.9486667 1.638500 0.8800000 11914.258 -1.5926667 0.7581667
72 10 dice , clark empirical, logis 0.20265 -866.0878 6354.424 177958679 74.42783 0.0046667 0.3135000 0.9360000 0.9818333 1.279833 0.8778333 10934.783 -0.2145000 0.8330000
9 10 manhattan, chebyshev cauchy, norm 0.20215 -1430.9840 6398.092 188161176 74.56583 -0.0585000 0.2861667 0.8865000 0.9236667 1.587667 0.8613333 11559.914 -1.8636667 0.7006667
90 10 gower, avg cauchy, norm 0.19975 -1121.0180 6418.189 179478430 74.24067 -0.0505000 0.2861667 0.8916667 0.9260000 1.587500 0.8533333 11031.397 -1.4951667 0.6866667
27 10 jaccard, jaccard norm , cauchy 0.19760 -873.6373 6512.593 184454874 74.38200 -0.0518333 0.2738333 0.8641667 0.8970000 1.351000 0.8385000 11295.676 -1.5745000 0.7505000
67 10 euclidean, manhattan cauchy, exp 0.19685 -948.5235 6814.524 194915552 77.31017 -0.0568333 0.2888333 0.9166667 0.9401667 1.605167 0.8795000 11950.976 -1.5485000 0.7848333
98 10 lorentzian , squared_euclidean norm , cauchy 0.19325 -1042.4753 6503.183 187346653 75.86083 -0.0283333 0.2956667 0.9223333 0.9533333 1.477667 0.8913333 11519.460 -1.2820000 0.7851667
70 10 jaccard, gower norm , cauchy 0.19040 -336.8587 6550.530 167404421 73.99267 -0.0368333 0.2868333 0.9123333 0.9356667 1.441000 0.8783333 10346.005 -1.2393333 0.7903333
15 10 clark , euclidean logis, t 0.18990 -5317.1587 10091.729 631653619 114.14983 -0.1248333 0.3376667 1.0658333 1.0951667 2.376167 1.0636667 37757.621 -4.1290000 0.8325000
40 10 lorentzian, jaccard norm, t 0.18925 -1116.1363 6428.041 178038459 74.48600 -0.0543333 0.2933333 0.9013333 0.9380000 1.610167 0.8615000 10960.690 -1.6995000 0.7551667
71 10 manhattan, clark empirical, cauchy 0.18410 -1046.9252 6305.790 177242293 73.87483 -0.0170000 0.3051667 0.9151667 0.9571667 1.423000 0.8620000 10880.669 -0.5423333 0.7826667
65 10 avg , lorentzian cauchy, norm 0.18165 -860.6475 6273.967 174503940 72.84700 -0.0346667 0.2776667 0.8735000 0.9088333 1.244333 0.8441667 10721.652 -1.3641667 0.6918333
5 10 euclidean , squared_euclidean logis, norm 0.18110 -879.9973 6428.592 171236022 73.47517 -0.0370000 0.2951667 0.9133333 0.9341667 1.628333 0.8688333 10547.469 -1.1835000 0.8273333
34 10 euclidean, euclidean t , empirical 0.18110 -1270.2000 6294.010 173398580 73.57017 -0.0586667 0.2863333 0.8891667 0.9271667 1.607000 0.8513333 10679.290 -1.7026667 0.6880000
97 10 euclidean, chebyshev exp , empirical 0.18055 -1370.9735 6146.852 178151364 72.51450 -0.0505000 0.2770000 0.8600000 0.9036667 1.379333 0.8371667 10944.083 -1.7485000 0.6326667
95 10 manhattan , squared_euclidean norm, norm 0.17980 -1418.2690 6725.072 198313386 76.73850 -0.0490000 0.3006667 0.9303333 0.9471667 1.782500 0.8773333 12142.191 -1.3891667 0.8326667
63 10 chebyshev, euclidean cauchy, norm 0.17970 -1629.5315 6426.457 191037529 75.41800 -0.0608333 0.2950000 0.9081667 0.9410000 1.679167 0.8746667 11738.242 -1.8870000 0.7041667
96 10 chebyshev , lorentzian t , norm 0.17625 -1261.2040 6807.739 206507881 77.68983 -0.0553333 0.2870000 0.8978333 0.9310000 1.576500 0.8671667 12612.268 -1.6558333 0.7565000
89 10 gower, gower empirical, t 0.17400 -1178.8008 6384.291 182716077 74.14950 -0.0636667 0.2763333 0.8640000 0.9015000 1.487333 0.8383333 11214.671 -1.8883333 0.6555000
25 10 euclidean , lorentzian empirical, exp 0.17200 -1057.1113 6463.954 184531090 74.55417 -0.0573333 0.2790000 0.8748333 0.9073333 1.458333 0.8448333 11306.469 -1.6208333 0.7008333
23 10 manhattan, chebyshev norm , cauchy 0.17200 -1370.5175 6580.256 185350254 75.58283 -0.0733333 0.2943333 0.9210000 0.9433333 1.793333 0.8823333 11432.055 -2.1041667 0.8101667
44 10 squared_euclidean, dice norm, norm 0.17175 -995.9355 6615.075 183190100 75.19283 -0.0500000 0.3001667 0.9230000 0.9495000 1.622833 0.8786667 11279.641 -1.5770000 0.7611667
39 10 lorentzian , squared_euclidean logis , empirical 0.16890 -1295.5753 6772.305 211441082 78.57483 -0.0376667 0.2928333 0.9161667 0.9508333 1.486833 0.8876667 12914.098 -1.4615000 0.7728333
66 10 euclidean, jaccard empirical, t 0.16885 -1098.8565 6515.307 187266082 75.28283 -0.0626667 0.2800000 0.8796667 0.9161667 1.496500 0.8530000 11491.005 -1.7535000 0.6946667
56 10 jaccard , euclidean logis , empirical 0.16720 -1307.4000 6279.768 184012233 73.89150 -0.0480000 0.2763333 0.8666667 0.9081667 1.409000 0.8431667 11282.767 -1.6855000 0.6720000
16 10 clark , squared_euclidean norm , empirical 0.16560 -5430.7632 9315.151 609923400 108.74117 -0.0883333 0.3150000 0.9883333 1.0363333 1.745000 1.0368333 36424.723 -3.9751667 0.7921667
47 10 gower, dice cauchy , empirical 0.16385 -1393.3568 6182.032 181112842 72.87733 -0.0506667 0.2815000 0.8680000 0.9088333 1.450000 0.8438333 11118.197 -1.6936667 0.6610000
92 10 euclidean, euclidean norm , cauchy 0.16245 -1259.2183 6476.160 186403439 75.18700 -0.0528333 0.2913333 0.9035000 0.9360000 1.647833 0.8635000 11445.440 -1.5420000 0.7458333
12 10 lorentzian, lorentzian exp , empirical 0.16095 -880.2260 6686.079 199903744 77.05400 -0.0300000 0.2786667 0.8903333 0.9251667 1.118167 0.8663333 12219.831 -1.4766667 0.7308333
35 10 dice , gower norm , empirical 0.15845 -1066.3492 5911.657 163935110 70.34217 -0.0326667 0.2686667 0.8475000 0.8895000 1.133667 0.8235000 10101.107 -1.4908333 0.6725000
77 10 manhattan, manhattan empirical, t 0.15725 -1116.9092 6350.289 179840078 73.84683 -0.0506667 0.2820000 0.8776667 0.9096667 1.485500 0.8501667 11045.919 -1.6420000 0.7210000
41 10 manhattan, manhattan t , empirical 0.15550 -1255.4373 6728.119 198467090 77.16867 -0.0536667 0.2946667 0.9168333 0.9470000 1.651667 0.8795000 12161.848 -1.6371667 0.7295000
24 10 squared_euclidean, dice empirical, logis 0.15530 -1092.8768 6607.361 185194280 75.89983 -0.0621667 0.2953333 0.9175000 0.9493333 1.656167 0.8825000 11412.088 -1.8668333 0.7580000
37 10 gower , divergence empirical, cauchy 0.15320 -1067.6582 6415.687 182262223 76.61483 0.0138333 0.3573333 1.0178333 1.0728333 1.517000 0.9430000 11250.134 0.3671667 0.8396667
28 10 manhattan, gower exp , empirical 0.14885 -504.0503 7027.039 199733130 78.03017 -0.0285000 0.2928333 0.9258333 0.9461667 1.394833 0.8920000 12217.467 -1.1651667 0.7860000
94 10 euclidean, euclidean empirical, t 0.14595 -1160.7033 6425.139 183416791 74.77150 -0.0513333 0.2855000 0.8901667 0.9300000 1.536167 0.8583333 11266.298 -1.6791667 0.7083333
53 10 lorentzian, avg cauchy , empirical 0.14425 -837.6475 6278.195 169777714 72.68633 -0.0376667 0.2823333 0.8826667 0.9128333 1.351667 0.8471667 10443.588 -1.3626667 0.7215000
93 10 gower , euclidean empirical, logis 0.14245 -1127.8158 6295.230 177634023 73.25050 -0.0475000 0.2761667 0.8626667 0.9035000 1.457000 0.8291667 10895.509 -1.4705000 0.6831667
1 10 euclidean , lorentzian norm, norm 0.13720 -1101.1345 6483.005 176958641 74.11217 -0.0680000 0.2875000 0.8975000 0.9271667 1.686500 0.8510000 10895.183 -1.7213333 0.6593333
87 10 clark , euclidean cauchy , empirical 0.13675 -5900.5617 9496.173 621630384 110.56267 -0.1245000 0.3250000 1.0155000 1.0630000 2.200833 1.0416667 37159.605 -4.4956667 0.7438333
84 10 lorentzian, chebyshev norm , empirical 0.11860 -1198.7222 6368.122 184717611 74.47500 -0.0496667 0.2813333 0.8775000 0.9181667 1.458833 0.8511667 11330.715 -1.6315000 0.6720000
17 10 avg , jaccard norm , empirical 0.11130 -1470.6315 6762.974 202334165 77.49817 -0.0671667 0.2995000 0.9235000 0.9516667 1.769333 0.8791667 12399.887 -1.8636667 0.7368333
49 10 avg , lorentzian logis , empirical 0.10995 -985.7722 6351.975 172105924 73.12800 -0.0543333 0.2800000 0.8783333 0.9171667 1.552833 0.8455000 10598.459 -1.6270000 0.6521667
57 10 lorentzian, divergence empirical, exp 0.09620 -1031.3753 6460.685 185656640 76.30867 0.0118333 0.3488333 0.9925000 1.0413333 1.399667 0.9198333 11420.129 0.1795000 0.8826667
61 10 euclidean, avg empirical, empirical 0.07515 -1096.2438 6469.849 184241686 75.01450 -0.0473333 0.2870000 0.8953333 0.9318333 1.539667 0.8621667 11311.896 -1.5465000 0.7011667
11 10 avg , gower empirical, empirical 0.06125 -1260.6433 6348.656 182447867 74.20667 -0.0550000 0.2818333 0.8788333 0.9175000 1.516500 0.8491667 11200.435 -1.6931667 0.6775000

Here are the best parameters discovered during the random search:

knitr::kable(example2$history[1,]
, align = "ccc", caption = "Testing errors for each time feature after random search")
Testing errors for each time feature after random search
seq_len method distr avg_pred_scores avg_me avg_mae avg_mse avg_rmsse avg_mpe avg_mape avg_rmae avg_rrmse avg_rame avg_mase avg_smse avg_sce avg_gmrae
10 10 divergence, divergence empirical, t 0.37985 -2756.615 6713.846 261567162 80.35933 -0.058 0.27 0.8593333 0.9008333 0.6405 0.8491667 15857.04 -2.637333 0.7456667

Let’s have a look to the plots for the best model.

example2$best$plots
  $daily_cases

  
  $daily_deaths

Some useful references


  1. The error metrics are calculated using the greybox package. For any info, you can look here: https://cran.r-project.org/web/packages/greybox/index.html↩︎

  2. We used the entropy package base options. For any information, you can look here: https://cran.r-project.org/web/packages/entropy/index.html↩︎