“Once the tower is built, the person who stacked the tower plays first. Moving in the game Jenga consists of:
taking one block on a turn from any level of the tower (except the one below an incomplete top level), and
placing it on the topmost level in order to complete it.
The game ends when the tower falls – completely or if any block falls from the tower (other than the block a player moves on a turn).”
(excerpts from Jenga’s Rules)
Jenga is a model to extrapolate time features using k-nearest neighbors. The name is a kind of joke: like in the game, Jenga lets you extrapolate a time feature “building on top of previous blocks” through the combination of nearest sequences, using kernel transformation and weighted mean.
Here is a description of Jenga’s main steps for the creation of a single model.
The overall model architecture
Jenga allows for the automatic search (random or grid) in a compact space of hyper-parameters:
seq_len: length of future projection (it may be adjusted for the length of time features, number of testing windows, degree of differentiation of each feature)
k: the number of neighbors to be used to calculate the extrapolation of each time feature
dist_method: the distance metrics used for each time-feature (using philentropy1 package: “euclidean”, “manhattan”, “chebyshev”, “gower”, “lorentzian”, “jaccard”, “dice”, “squared_euclidean”, “divergence”, “clark”, “avg”)
kernel: distribution used to calculate kernel densities/ weights (“norm”, “cauchy”, “logis”, “unif”, “t”), while others (“exp”, “lnorm” and “empirical”) have been dismissed from previous versions.
mode: how the set of sequences is created (through deterministic segmentation of each time feature or through random sampling for the same number of possible sequences)
In our introduction to Jenga, we are going to use the covid_in_europe
(from www.ecdc.europa.eu). As showed below the time features are expected in (ordered) columns in a dataframe format.
date | daily_cases | daily_deaths | cumulative_cases | cumulative_deaths |
---|---|---|---|---|
2021-03-02 | 102125 | 1973 | 22724397 | 549967 |
2021-03-03 | 117049 | 2755 | 22841446 | 552722 |
2021-03-04 | 133743 | 2461 | 22975189 | 555183 |
2021-03-05 | 133057 | 2724 | 23108246 | 557907 |
2021-03-06 | 126833 | 2111 | 23235079 | 560018 |
2021-03-07 | 103745 | 1527 | 23338824 | 561545 |
2021-03-08 | 88904 | 1535 | 23427728 | 563080 |
2021-03-09 | 107841 | 2157 | 23535569 | 565237 |
2021-03-10 | 129334 | 2580 | 23664903 | 567817 |
2021-03-11 | 148639 | 2416 | 23813542 | 570233 |
In the example, we are predicting four time features at the same time, projecting each feature for 10 steps in the future, 30 possible models are explored through random search and their error is tested with 30 windows. All the other hyper-parameters are randomly sampled from the complete set of possible values (but the user may select his own desired values). NA
values are dealt with kalman filter using the imputeTS package2.
library(jenga)
<- jenga(covid_in_europe[, 2:5], seq_len = 10, n_sample = 30, search = "random", seed = 42, n_windows = 30, dates = covid_in_europe$date)
example : 129.94 sec elapsed time
The result is a list of different components, as you can see below.
names(example)
1] "exploration" "history" "best" "time_log" [
The history
variable includes the models explored during the grid/random search. The best model is being selected according the highest prediction score, the lowest mean error and the lower mean absolute error. The prediction score measures the distance between the ground truth and the expected median: when the actual sequences derange from the prediction intervals, the prediction score is close to 0; when the actual sequences are close to the median of the prediction intervals, the prediction score gets close to 1 (quite a challenge to get above 0.5).
seq_len | k | dist_method | kernel | mode | pred_scores | avg_me | avg_mae | avg_mse | avg_rmsse | avg_mpe | avg_mape | avg_rmae | avg_rrmse | avg_rame | avg_mase | avg_smse | avg_sce | avg_gmrae | avg_pred_score | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | 10 | 12 | dice , euclidean , lorentzian, clark | unif , cauchy, norm , t | segmented | 0.3349 | 20776.179 | 27369.20 | 3760088863 | 149.0090 | 0.3646750 | 0.4702917 | 1.3381500 | 1.3435833 | 4.736658 | 0.9668667 | 57757.36 | 6.595658 | 1.3453583 | 0.418975 |
16 | 10 | 12 | divergence, gower , dice , clark | t , unif, unif, t | sampled | 0.3314 | 19322.572 | 26183.73 | 3491987983 | 139.1536 | 0.1139333 | 0.3197833 | 0.9622417 | 0.9786167 | 2.654383 | 0.8369750 | 52151.08 | 4.812492 | 0.9301083 | 0.416425 |
28 | 10 | 12 | gower , manhattan, clark , dice | t , unif , cauchy, norm | segmented | 0.3263 | 19826.223 | 26930.88 | 3693082086 | 143.3051 | 0.0814750 | 0.2892583 | 0.9802667 | 1.0110833 | 2.689150 | 0.8770500 | 54197.78 | 5.218625 | 0.9373167 | 0.416900 |
30 | 10 | 12 | squared_euclidean, squared_euclidean, manhattan , divergence | unif , t , norm , cauchy | segmented | 0.3221 | 21314.527 | 27686.21 | 3852906355 | 150.6293 | 0.0729833 | 0.2517833 | 0.8944333 | 0.9096500 | 2.427033 | 0.9000000 | 58881.24 | 5.969633 | 0.9195667 | 0.415125 |
19 | 10 | 11 | divergence, lorentzian, avg , divergence | unif , cauchy, t , t | sampled | 0.3141 | 16704.570 | 22770.39 | 2858655513 | 125.4290 | 0.3230917 | 0.4767500 | 1.3860333 | 1.3847250 | 6.415000 | 0.9005417 | 44024.11 | 5.999208 | 1.4149083 | 0.412825 |
13 | 10 | 11 | jaccard , manhattan , manhattan , squared_euclidean | t , unif , norm , logis | segmented | 0.3113 | 17114.695 | 22761.29 | 2882006692 | 121.8863 | 0.0618917 | 0.3267000 | 1.0032333 | 1.0150667 | 3.762358 | 0.7904833 | 42200.34 | 4.412850 | 1.0009500 | 0.412900 |
8 | 10 | 12 | manhattan, dice , jaccard , clark | logis, unif , unif , unif | sampled | 0.3111 | 20903.351 | 27520.07 | 3911016451 | 149.3229 | 0.1429667 | 0.3831917 | 1.1382167 | 1.1561833 | 3.019733 | 0.9337500 | 59411.88 | 5.608683 | 1.1407917 | 0.410475 |
10 | 10 | 10 | euclidean, dice , gower , clark | t , unif, norm, t | sampled | 0.3101 | 18158.902 | 23923.00 | 3130861792 | 124.1952 | 0.1044000 | 0.3760917 | 1.0548583 | 1.0641167 | 2.388600 | 0.7956083 | 43920.66 | 4.363192 | 1.0814833 | 0.408725 |
26 | 10 | 10 | jaccard , euclidean, manhattan, manhattan | cauchy, norm , cauchy, logis | segmented | 0.3047 | 16410.598 | 21867.83 | 2677230537 | 121.1541 | 0.3268500 | 0.4506000 | 1.2629000 | 1.2809917 | 4.871458 | 0.8872583 | 40329.55 | 6.419142 | 1.2753417 | 0.410925 |
4 | 10 | 11 | squared_euclidean, manhattan , chebyshev , squared_euclidean | norm , norm , t , logis | sampled | 0.3046 | 16112.183 | 22306.59 | 2859873131 | 125.4395 | 0.3063333 | 0.4804750 | 1.3174083 | 1.3464500 | 5.488875 | 0.9131583 | 44011.37 | 6.083217 | 1.2506333 | 0.411525 |
20 | 10 | 11 | clark , squared_euclidean, dice , jaccard | logis , t , cauchy, norm | segmented | 0.3043 | 18671.906 | 23237.56 | 2897829115 | 125.8602 | 0.0189583 | 0.2398667 | 0.7983250 | 0.8217750 | 1.716642 | 0.7788000 | 43823.29 | 4.944067 | 0.7845167 | 0.410050 |
1 | 10 | 3 | squared_euclidean, avg , jaccard , clark | unif , logis , t , cauchy | segmented | 0.2836 | 14746.081 | 28037.97 | 3689354573 | 144.1110 | -0.0083500 | 0.2686083 | 0.8520917 | 0.8745417 | 0.986350 | 1.0216917 | 47558.20 | 2.998275 | 0.8459333 | 0.400225 |
7 | 10 | 4 | gower , chebyshev , jaccard , lorentzian | unif , cauchy, norm , logis | sampled | 0.2832 | 13682.540 | 27219.29 | 3313591787 | 136.1868 | 0.0292417 | 0.2736250 | 0.8827333 | 0.8840667 | 4.896592 | 0.9731667 | 40335.16 | 2.081983 | 0.8850333 | 0.404625 |
6 | 10 | 6 | manhattan, jaccard , jaccard , manhattan | logis , cauchy, t , logis | segmented | 0.2818 | 16348.193 | 23159.64 | 3003857753 | 118.5173 | 0.0695500 | 0.2929000 | 0.9023583 | 0.9061500 | 1.948842 | 0.8106750 | 41266.66 | 3.647008 | 0.9042333 | 0.403950 |
24 | 10 | 6 | divergence, chebyshev , euclidean , jaccard | logis , t , unif , cauchy | segmented | 0.2814 | 19613.034 | 24599.61 | 3192378632 | 125.9653 | 0.0099000 | 0.2517083 | 0.8301583 | 0.8375750 | 1.235558 | 0.8155750 | 44400.91 | 3.622492 | 0.8207833 | 0.404550 |
15 | 10 | 6 | dice , squared_euclidean, lorentzian , gower | norm , norm , logis , cauchy | segmented | 0.2792 | 16480.062 | 24622.54 | 3001635804 | 126.4097 | 0.0932500 | 0.3157417 | 0.9660583 | 0.9671333 | 1.887392 | 0.8565750 | 41707.85 | 3.743300 | 0.9663833 | 0.400250 |
21 | 10 | 6 | euclidean , lorentzian, divergence, dice | norm , norm , logis, unif | segmented | 0.2789 | 18024.776 | 25531.88 | 3200683307 | 129.7421 | 0.1030250 | 0.3189167 | 0.9751417 | 0.9908750 | 2.572142 | 0.8687333 | 44275.82 | 3.736633 | 0.9669250 | 0.404450 |
25 | 10 | 4 | squared_euclidean, avg , lorentzian , euclidean | cauchy, norm , logis , unif | segmented | 0.2785 | 12130.162 | 23087.40 | 2383872079 | 121.6489 | 0.0741083 | 0.2461000 | 0.8437583 | 0.8447333 | 4.489733 | 0.8850750 | 33305.20 | 2.638383 | 0.8428500 | 0.400625 |
17 | 10 | 4 | manhattan , avg , lorentzian, jaccard | norm, unif, norm, norm | segmented | 0.2781 | 12301.796 | 23754.82 | 2450826428 | 125.1309 | -0.0038083 | 0.2230583 | 0.7880917 | 0.7961417 | 2.598367 | 0.8843667 | 34023.93 | 2.012933 | 0.7949000 | 0.399475 |
12 | 10 | 6 | chebyshev , lorentzian , squared_euclidean, lorentzian | unif , cauchy, logis , t | sampled | 0.2739 | 17371.732 | 25927.04 | 3345945952 | 130.6321 | 0.1233417 | 0.2987417 | 0.9742167 | 0.9952083 | 2.534117 | 0.8719083 | 45256.19 | 4.184283 | 0.9421250 | 0.398375 |
11 | 10 | 9 | divergence, gower , divergence, avg | unif, norm, norm, norm | sampled | 0.2734 | 13850.352 | 22349.83 | 2572308960 | 121.4817 | 0.1974583 | 0.3862917 | 1.1511417 | 1.1691583 | 1.832242 | 0.8707667 | 38639.23 | 5.269725 | 1.1161750 | 0.400675 |
9 | 10 | 3 | clark , squared_euclidean, lorentzian , euclidean | logis, norm , norm , norm | segmented | 0.2677 | 14583.322 | 26731.70 | 3077042706 | 137.4450 | -0.0168500 | 0.2498917 | 0.8123667 | 0.8331167 | 0.907725 | 0.9657833 | 41656.50 | 2.706167 | 0.7991500 | 0.399800 |
2 | 10 | 7 | squared_euclidean, gower , gower , jaccard | unif , cauchy, logis , cauchy | segmented | 0.2630 | 10007.993 | 22191.76 | 2351898127 | 119.4387 | 0.1457667 | 0.3390417 | 1.0095583 | 1.0215750 | 2.700450 | 0.8889500 | 33666.91 | 3.490225 | 0.9977000 | 0.400625 |
29 | 10 | 3 | clark , divergence, euclidean , manhattan | logis, norm , logis, t | sampled | 0.2622 | 15796.921 | 32456.63 | 4906333232 | 158.6052 | 0.0310083 | 0.2834833 | 0.9147833 | 0.9340333 | 1.038992 | 1.1061583 | 57719.86 | 3.493758 | 0.8941833 | 0.391925 |
22 | 10 | 7 | clark, avg , clark, clark | cauchy, cauchy, cauchy, cauchy | segmented | 0.2602 | 8311.886 | 22337.80 | 2400231431 | 118.9371 | 0.0999000 | 0.3277667 | 0.9624250 | 0.9900500 | 3.015133 | 0.8636000 | 33836.45 | 2.692808 | 0.9061750 | 0.397750 |
3 | 10 | 3 | manhattan , lorentzian, gower , manhattan | norm , unif , t , cauchy | sampled | 0.2568 | 15622.599 | 29553.00 | 4258069367 | 147.2335 | 0.0552750 | 0.2650417 | 0.8884667 | 0.8856417 | 1.156475 | 1.0749750 | 50269.85 | 3.993958 | 0.9504750 | 0.391875 |
18 | 10 | 5 | chebyshev , jaccard , euclidean , divergence | t , logis , norm , cauchy | sampled | 0.2562 | 15154.460 | 25712.29 | 3198191852 | 129.1797 | 0.0251667 | 0.3067583 | 1.0052250 | 1.0474833 | 3.763325 | 0.9672917 | 38661.82 | 3.017925 | 0.9514333 | 0.391150 |
23 | 10 | 7 | squared_euclidean, euclidean , avg , divergence | unif, unif, t , norm | sampled | 0.2533 | 7467.247 | 23063.54 | 2615772067 | 125.6104 | 0.0173417 | 0.3421250 | 0.9611667 | 0.9802083 | 2.079517 | 0.9422583 | 38082.82 | 1.801075 | 0.9447750 | 0.397625 |
14 | 10 | 7 | euclidean , manhattan , lorentzian , squared_euclidean | unif , t , cauchy, norm | sampled | 0.2518 | 10086.010 | 25697.69 | 3265549914 | 136.3572 | 0.0296167 | 0.3087833 | 0.9456583 | 0.9535667 | 2.866708 | 0.9617750 | 43575.49 | 2.734808 | 0.9532917 | 0.392175 |
27 | 10 | 5 | jaccard, jaccard, jaccard, jaccard | norm , logis, logis, unif | sampled | 0.2484 | 13137.857 | 28353.72 | 3225147052 | 146.0960 | 0.0152500 | 0.3746167 | 1.1959500 | 1.2366917 | 4.392333 | 1.0980667 | 41700.49 | 1.545033 | 1.1141917 | 0.391175 |
For each time-feature you get a prediction set including min, max, q25, median, q75, selected ci, mean, sd, mode, skewness, kurtosis, and some other stuff.
min | 10% | 25% | 50% | 75% | 90% | max | mean | sd | mode | kurtosis | skewness | iqr_to_range | risk_ratio | upside_prob | divergence | entropy | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2021-08-12 | 5426 | 24353 | 49122 | 57442 | 63024.00 | 88537 | 116669 | 56837.22 | 23754.19 | 57031.57 | 3.818 | 0.306 | 0.1250 | 1.1386 | NA | NA | 6.8119 |
2021-08-13 | 16757 | 26982 | 43028 | 55747 | 64010.50 | 93352 | 119833 | 56468.04 | 23692.99 | 53143.28 | 3.240 | 0.617 | 0.2036 | 1.6437 | 0.479 | 0.967 | 6.8201 |
2021-08-14 | 18827 | 33989 | 51325 | 65350 | 79478.25 | 92508 | 107359 | 63992.64 | 22187.94 | 70221.04 | 2.511 | -0.112 | 0.3180 | 0.9030 | 0.576 | 0.968 | 6.8431 |
2021-08-15 | 14728 | 28533 | 44138 | 61499 | 85709.00 | 93700 | 101019 | 62822.70 | 23614.00 | 72850.62 | 2.053 | -0.194 | 0.4818 | 0.8450 | 0.483 | 0.939 | 6.8308 |
2021-08-16 | 1471 | 26987 | 48598 | 73767 | 93032.00 | 101557 | 109806 | 69899.14 | 28609.81 | 86740.62 | 2.922 | -0.748 | 0.4102 | 0.4985 | 0.579 | 0.966 | 6.7976 |
2021-08-17 | 0 | 39884 | 59310 | 70565 | 89105.00 | 96144 | 107529 | 67311.95 | 24794.93 | 65599.64 | 4.255 | -1.014 | 0.2771 | 0.5238 | 0.497 | 0.971 | 6.8096 |
2021-08-18 | 42959 | 48356 | 60462 | 64764 | 73743.00 | 88481 | 92891 | 66227.01 | 12343.20 | 65050.28 | 2.938 | 0.428 | 0.2660 | 1.2899 | 0.445 | 0.939 | 6.8906 |
2021-08-19 | 769 | 24419 | 55871 | 70668 | 83738.00 | 102118 | 131961 | 68662.12 | 27682.93 | 65640.04 | 3.588 | -0.411 | 0.2124 | 0.8769 | 0.522 | 0.935 | 6.8054 |
2021-08-20 | 19014 | 22031 | 51565 | 67262 | 82704.00 | 107148 | 137999 | 67157.24 | 29215.31 | 70339.14 | 3.028 | 0.305 | 0.2617 | 1.4661 | 0.451 | 0.967 | 6.8083 |
2021-08-21 | 23993 | 35491 | 64189 | 77238 | 92214.00 | 106561 | 109183 | 73498.89 | 23740.29 | 74166.01 | 2.582 | -0.567 | 0.3290 | 0.6000 | 0.563 | 0.967 | 6.8490 |
min | 10% | 25% | 50% | 75% | 90% | max | mean | sd | mode | kurtosis | skewness | iqr_to_range | risk_ratio | upside_prob | divergence | entropy | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2021-08-12 | 43 | 124.0 | 177 | 198 | 273 | 344 | 488 | 223.259 | 86.213 | 208.404 | 4.088 | 0.732 | 0.2157 | 1.8710 | NA | NA | 6.8335 |
2021-08-13 | 0 | 59.0 | 170 | 280 | 372 | 430 | 722 | 280.409 | 148.491 | 298.314 | 4.292 | 0.601 | 0.2798 | 1.5786 | 0.619 | 0.940 | 6.7522 |
2021-08-14 | 0 | 89.0 | 151 | 290 | 383 | 455 | 527 | 274.164 | 140.235 | 273.794 | 2.111 | 0.044 | 0.4402 | 0.8172 | 0.450 | 0.967 | 6.7581 |
2021-08-15 | 0 | 90.0 | 179 | 339 | 408 | 457 | 555 | 305.297 | 140.056 | 384.788 | 2.211 | -0.266 | 0.4126 | 0.6372 | 0.527 | 0.973 | 6.7813 |
2021-08-16 | 110 | 182.0 | 243 | 374 | 410 | 477 | 569 | 344.042 | 114.698 | 300.148 | 2.311 | -0.164 | 0.3638 | 0.7386 | 0.570 | 0.968 | 6.8481 |
2021-08-17 | 116 | 195.9 | 245 | 293 | 336 | 374 | 546 | 291.050 | 81.015 | 312.012 | 4.389 | 0.393 | 0.2116 | 1.4294 | 0.363 | 0.930 | 6.8684 |
2021-08-18 | 121 | 160.0 | 236 | 296 | 323 | 412 | 606 | 298.485 | 104.728 | 283.904 | 4.348 | 1.004 | 0.1794 | 1.7714 | 0.530 | 0.932 | 6.8492 |
2021-08-19 | 66 | 210.0 | 242 | 325 | 390 | 534 | 660 | 331.791 | 126.857 | 312.846 | 3.271 | 0.478 | 0.2492 | 1.2934 | 0.588 | 0.963 | 6.8331 |
2021-08-20 | 76 | 189.0 | 301 | 350 | 441 | 523 | 685 | 367.136 | 133.115 | 374.786 | 3.251 | 0.105 | 0.2299 | 1.2226 | 0.557 | 0.969 | 6.8372 |
2021-08-21 | 94 | 213.0 | 282 | 395 | 507 | 596 | 847 | 398.656 | 155.658 | 381.836 | 3.336 | 0.544 | 0.2988 | 1.5017 | 0.587 | 0.965 | 6.8311 |
min | 10% | 25% | 50% | 75% | 90% | max | mean | sd | mode | kurtosis | skewness | iqr_to_range | risk_ratio | upside_prob | divergence | entropy | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2021-08-12 | 35485645 | 35489335 | 35497105 | 35516929 | 35535973 | 35560301 | 35580609 | 35521811 | 27643.39 | 35499133 | 2.189 | 0.479 | 0.4093 | 2.0355 | NA | NA | 6.9078 |
2021-08-13 | 35533703 | 35548256 | 35551631 | 35580041 | 35605258 | 35633338 | 35646091 | 35582272 | 32777.50 | 35562246 | 1.893 | 0.461 | 0.4772 | 1.4254 | 0.925 | 0.970 | 6.9078 |
2021-08-14 | 35592139 | 35599710 | 35610262 | 35631171 | 35668750 | 35715812 | 35759069 | 35642998 | 41604.82 | 35617820 | 2.714 | 0.909 | 0.3504 | 3.2767 | 0.873 | 0.975 | 6.9078 |
2021-08-15 | 35618596 | 35657366 | 35678647 | 35702886 | 35761391 | 35813094 | 35865737 | 35717879 | 57035.21 | 35690092 | 2.480 | 0.731 | 0.3348 | 1.9320 | 0.853 | 0.937 | 6.9078 |
2021-08-16 | 35618596 | 35718536 | 35749159 | 35785851 | 35855134 | 35911810 | 35999818 | 35801578 | 71269.40 | 35775037 | 2.354 | 0.423 | 0.2780 | 1.2793 | 0.815 | 0.937 | 6.9078 |
2021-08-17 | 35670208 | 35786704 | 35825398 | 35881461 | 35944378 | 35988341 | 36211134 | 35888093 | 78394.55 | 35859400 | 2.723 | 0.288 | 0.2200 | 1.5606 | 0.800 | 0.956 | 6.9078 |
2021-08-18 | 35752851 | 35872288 | 35918153 | 35979196 | 36039878 | 36091364 | 36421700 | 35981841 | 91931.97 | 35983776 | 4.031 | 0.569 | 0.1820 | 1.9550 | 0.766 | 0.952 | 6.9078 |
2021-08-19 | 35825424 | 35954971 | 36008715 | 36070997 | 36151153 | 36222946 | 36449889 | 36082467 | 104734.84 | 36070271 | 3.207 | 0.497 | 0.2281 | 1.5429 | 0.765 | 0.969 | 6.9078 |
2021-08-20 | 35884445 | 36037363 | 36100740 | 36186983 | 36283382 | 36365146 | 37046644 | 36197637 | 134993.68 | 36164069 | 4.553 | 0.677 | 0.1572 | 2.8415 | 0.761 | 0.926 | 6.9077 |
2021-08-21 | 35894202 | 36127844 | 36204896 | 36296872 | 36405872 | 36507060 | 37046644 | 36311494 | 154204.39 | 36291665 | 3.742 | 0.613 | 0.1744 | 1.8620 | 0.714 | 0.941 | 6.9077 |
min | 10% | 25% | 50% | 75% | 90% | max | mean | sd | mode | kurtosis | skewness | iqr_to_range | risk_ratio | upside_prob | divergence | entropy | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2021-08-12 | 747107 | 747126.0 | 747166.0 | 747228.0 | 747307.0 | 747454.0 | 747609 | 747258.3 | 122.890 | 747207.9 | 3.852 | 1.134 | 0.2809 | 3.1488 | NA | NA | 6.9078 |
2021-08-13 | 747239 | 747290.0 | 747369.0 | 747452.0 | 747609.0 | 747772.0 | 747934 | 747498.2 | 181.640 | 747391.3 | 3.117 | 0.925 | 0.3453 | 2.2629 | 0.866 | 0.958 | 6.9078 |
2021-08-14 | 747293 | 747454.0 | 747532.8 | 747737.0 | 747911.0 | 748063.0 | 748494 | 747749.9 | 238.886 | 747646.9 | 2.659 | 0.476 | 0.3149 | 1.7050 | 0.794 | 0.954 | 6.9078 |
2021-08-15 | 747392 | 747710.0 | 747811.0 | 748027.0 | 748260.0 | 748390.0 | 749055 | 748041.2 | 277.810 | 747904.2 | 2.683 | 0.281 | 0.2700 | 1.6189 | 0.777 | 0.954 | 6.9078 |
2021-08-16 | 747489 | 748008.0 | 748179.0 | 748363.5 | 748600.2 | 748791.0 | 749638 | 748391.3 | 308.405 | 748428.0 | 2.938 | 0.234 | 0.1960 | 1.4574 | 0.809 | 0.916 | 6.9078 |
2021-08-17 | 747710 | 748285.9 | 748457.8 | 748698.5 | 748923.5 | 749160.1 | 749896 | 748707.9 | 344.804 | 748664.9 | 3.197 | 0.219 | 0.2131 | 1.2114 | 0.729 | 0.917 | 6.9078 |
2021-08-18 | 747936 | 748550.0 | 748777.5 | 749017.0 | 749294.5 | 749542.3 | 750288 | 749039.9 | 393.581 | 748988.4 | 3.074 | 0.282 | 0.2198 | 1.1758 | 0.729 | 0.876 | 6.9078 |
2021-08-19 | 748251 | 748859.4 | 749073.0 | 749357.0 | 749713.2 | 750014.3 | 751085 | 749401.6 | 453.568 | 749221.2 | 3.012 | 0.395 | 0.2259 | 1.5624 | 0.712 | 0.890 | 6.9078 |
2021-08-20 | 748320 | 749133.8 | 749424.8 | 749825.5 | 750218.0 | 750585.0 | 751998 | 749842.4 | 562.417 | 749779.9 | 3.106 | 0.331 | 0.2157 | 1.4430 | 0.729 | 0.888 | 6.9078 |
2021-08-21 | 748534 | 749503.9 | 749859.0 | 750314.0 | 750785.5 | 751253.2 | 752516 | 750345.4 | 670.457 | 750238.6 | 2.853 | 0.246 | 0.2327 | 1.2371 | 0.700 | 0.908 | 6.9078 |
In version 1.1, IQR to range, risk ratio, upside probability, divergence and entropy, have been added directly inside the prediction table and the terminal values (calculation at both ends of each sequence) have been dismissed: Just a couple of brief explanation here:
1. IQR to range: well, almost self explanatory, the normalization of IQR to min-max range allows for comparison among different time features;
2. risk ratio: here we mean the ratio between the range above median and the range below (in financial series allows you to understand how deep is the precipice even when the trend is going up);
3. upside probability: no brainer, probability of getting a larger value compared to the previous point, easy (an annotation here: in most cases the value is around 50%);
4. divergence: we dismissed the average Kullback-Leibler divergence for a simpler measure of divergence, quite similar to Chebyshev distance: in our humble case, the max distance between subsequent ecdf;
5. entropy: a last new entry from a specific package3 (it could be of interest to understand how entropy evolves in long-term forecasting and how entropy is related to good or bad predictions).
The only aim of all these “odd” stats is to better understand where aleatoric uncertainty ends, and where epistemic uncertainty begins.
For each time features included in the model, you get a plot of the median values with the chosen confidence interval (ci
default is 0.8).
$best$plot
example$daily_cases
$daily_deaths
$cumulative_cases
$cumulative_deaths
Each model error is tested multiple times according to n_windows
. The test_errors
table includes the averaged values along all the tests (in version 1.1, we updated the metrics using a new package4).
me | mae | mse | rmsse | mpe | mape | rmae | rrmse | rame | mase | smse | sce | gmrae | pred_score | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
daily_cases | 9231.3266 | 20657.3955 | 6.983004e+08 | 220.45853 | 0.1995667 | 0.4724333 | 1.2839667 | 1.2697667 | 4.23550 | 1.6401333 | 55535.7380 | 7.464233 | 1.442633 | 0.5017 |
daily_deaths | 217.1158 | 236.4803 | 8.024105e+04 | 16.86817 | 1.2566000 | 1.4056667 | 3.3287000 | 3.3603000 | 13.99993 | 0.9014000 | 305.7909 | 8.302200 | 3.191200 | 0.4991 |
cumulative_cases | 73399.7667 | 88200.9325 | 1.434174e+10 | 347.58097 | 0.0021667 | 0.0026333 | 0.3805000 | 0.3828667 | 0.37350 | 1.0680667 | 175028.7794 | 8.841033 | 0.383700 | 0.3402 |
cumulative_deaths | 256.5057 | 382.0029 | 2.359607e+05 | 11.12823 | 0.0003667 | 0.0004333 | 0.3594333 | 0.3614000 | 0.33770 | 0.2578667 | 159.1135 | 1.775167 | 0.363900 | 0.3349 |
This is the link to the useful philentropy package: https://cran.r-project.org/web/packages/philentropy/index.html↩︎
imputeTS package may be found here: https://cran.r-project.org/web/packages/imputeTS/index.html↩︎
We used the entropy package base options. For any information, you can look here: https://cran.r-project.org/web/packages/entropy/index.html↩︎
In version 1.1 we decided to use the greybox package, well-focused on time series analysis, for the interesting option of choosing scale and benchmark of some error measures. The link: https://cran.r-project.org/web/packages/greybox/index.html↩︎