Jenga: a brief introduction

An introductory example

In our introduction to Jenga, we are going to use the covid_in_europe (from www.ecdc.europa.eu). As showed below the time features are expected in (ordered) columns in a dataframe format.

Covid numbers in Europe
date	daily_cases	daily_deaths	cumulative_cases	cumulative_deaths
2021-03-02	102125	1973	22724397	549967
2021-03-03	117049	2755	22841446	552722
2021-03-04	133743	2461	22975189	555183
2021-03-05	133057	2724	23108246	557907
2021-03-06	126833	2111	23235079	560018
2021-03-07	103745	1527	23338824	561545
2021-03-08	88904	1535	23427728	563080
2021-03-09	107841	2157	23535569	565237
2021-03-10	129334	2580	23664903	567817
2021-03-11	148639	2416	23813542	570233

In the example, we are predicting four time features at the same time, projecting each feature for 10 steps in the future, 30 possible models are explored through random search and their error is tested with 30 windows. All the other hyper-parameters are randomly sampled from the complete set of possible values (but the user may select his own desired values). NA values are dealt with kalman filter using the imputeTS package².

library(jenga)
example <- jenga(covid_in_europe[, 2:5], seq_len = 10, n_sample = 30, search = "random", seed = 42, n_windows = 30, dates = covid_in_europe$date)
  time: 129.94 sec elapsed

The result is a list of different components, as you can see below.

names(example)
  [1] "exploration" "history"     "best"        "time_log"

The history variable includes the models explored during the grid/random search. The best model is being selected according the highest prediction score, the lowest mean error and the lower mean absolute error. The prediction score measures the distance between the ground truth and the expected median: when the actual sequences derange from the prediction intervals, the prediction score is close to 0; when the actual sequences are close to the median of the prediction intervals, the prediction score gets close to 1 (quite a challenge to get above 0.5).

Models explored during Random Search
	seq_len	k	dist_method	kernel	mode	pred_scores	avg_me	avg_mae	avg_mse	avg_rmsse	avg_mpe	avg_mape	avg_rmae	avg_rrmse	avg_rame	avg_mase	avg_smse	avg_sce	avg_gmrae	avg_pred_score
5	10	12	dice , euclidean , lorentzian, clark	unif , cauchy, norm , t	segmented	0.3349	20776.179	27369.20	3760088863	149.0090	0.3646750	0.4702917	1.3381500	1.3435833	4.736658	0.9668667	57757.36	6.595658	1.3453583	0.418975
16	10	12	divergence, gower , dice , clark	t , unif, unif, t	sampled	0.3314	19322.572	26183.73	3491987983	139.1536	0.1139333	0.3197833	0.9622417	0.9786167	2.654383	0.8369750	52151.08	4.812492	0.9301083	0.416425
28	10	12	gower , manhattan, clark , dice	t , unif , cauchy, norm	segmented	0.3263	19826.223	26930.88	3693082086	143.3051	0.0814750	0.2892583	0.9802667	1.0110833	2.689150	0.8770500	54197.78	5.218625	0.9373167	0.416900
30	10	12	squared_euclidean, squared_euclidean, manhattan , divergence	unif , t , norm , cauchy	segmented	0.3221	21314.527	27686.21	3852906355	150.6293	0.0729833	0.2517833	0.8944333	0.9096500	2.427033	0.9000000	58881.24	5.969633	0.9195667	0.415125
19	10	11	divergence, lorentzian, avg , divergence	unif , cauchy, t , t	sampled	0.3141	16704.570	22770.39	2858655513	125.4290	0.3230917	0.4767500	1.3860333	1.3847250	6.415000	0.9005417	44024.11	5.999208	1.4149083	0.412825
13	10	11	jaccard , manhattan , manhattan , squared_euclidean	t , unif , norm , logis	segmented	0.3113	17114.695	22761.29	2882006692	121.8863	0.0618917	0.3267000	1.0032333	1.0150667	3.762358	0.7904833	42200.34	4.412850	1.0009500	0.412900
8	10	12	manhattan, dice , jaccard , clark	logis, unif , unif , unif	sampled	0.3111	20903.351	27520.07	3911016451	149.3229	0.1429667	0.3831917	1.1382167	1.1561833	3.019733	0.9337500	59411.88	5.608683	1.1407917	0.410475
10	10	10	euclidean, dice , gower , clark	t , unif, norm, t	sampled	0.3101	18158.902	23923.00	3130861792	124.1952	0.1044000	0.3760917	1.0548583	1.0641167	2.388600	0.7956083	43920.66	4.363192	1.0814833	0.408725
26	10	10	jaccard , euclidean, manhattan, manhattan	cauchy, norm , cauchy, logis	segmented	0.3047	16410.598	21867.83	2677230537	121.1541	0.3268500	0.4506000	1.2629000	1.2809917	4.871458	0.8872583	40329.55	6.419142	1.2753417	0.410925
4	10	11	squared_euclidean, manhattan , chebyshev , squared_euclidean	norm , norm , t , logis	sampled	0.3046	16112.183	22306.59	2859873131	125.4395	0.3063333	0.4804750	1.3174083	1.3464500	5.488875	0.9131583	44011.37	6.083217	1.2506333	0.411525
20	10	11	clark , squared_euclidean, dice , jaccard	logis , t , cauchy, norm	segmented	0.3043	18671.906	23237.56	2897829115	125.8602	0.0189583	0.2398667	0.7983250	0.8217750	1.716642	0.7788000	43823.29	4.944067	0.7845167	0.410050
1	10	3	squared_euclidean, avg , jaccard , clark	unif , logis , t , cauchy	segmented	0.2836	14746.081	28037.97	3689354573	144.1110	-0.0083500	0.2686083	0.8520917	0.8745417	0.986350	1.0216917	47558.20	2.998275	0.8459333	0.400225
7	10	4	gower , chebyshev , jaccard , lorentzian	unif , cauchy, norm , logis	sampled	0.2832	13682.540	27219.29	3313591787	136.1868	0.0292417	0.2736250	0.8827333	0.8840667	4.896592	0.9731667	40335.16	2.081983	0.8850333	0.404625
6	10	6	manhattan, jaccard , jaccard , manhattan	logis , cauchy, t , logis	segmented	0.2818	16348.193	23159.64	3003857753	118.5173	0.0695500	0.2929000	0.9023583	0.9061500	1.948842	0.8106750	41266.66	3.647008	0.9042333	0.403950
24	10	6	divergence, chebyshev , euclidean , jaccard	logis , t , unif , cauchy	segmented	0.2814	19613.034	24599.61	3192378632	125.9653	0.0099000	0.2517083	0.8301583	0.8375750	1.235558	0.8155750	44400.91	3.622492	0.8207833	0.404550
15	10	6	dice , squared_euclidean, lorentzian , gower	norm , norm , logis , cauchy	segmented	0.2792	16480.062	24622.54	3001635804	126.4097	0.0932500	0.3157417	0.9660583	0.9671333	1.887392	0.8565750	41707.85	3.743300	0.9663833	0.400250
21	10	6	euclidean , lorentzian, divergence, dice	norm , norm , logis, unif	segmented	0.2789	18024.776	25531.88	3200683307	129.7421	0.1030250	0.3189167	0.9751417	0.9908750	2.572142	0.8687333	44275.82	3.736633	0.9669250	0.404450
25	10	4	squared_euclidean, avg , lorentzian , euclidean	cauchy, norm , logis , unif	segmented	0.2785	12130.162	23087.40	2383872079	121.6489	0.0741083	0.2461000	0.8437583	0.8447333	4.489733	0.8850750	33305.20	2.638383	0.8428500	0.400625
17	10	4	manhattan , avg , lorentzian, jaccard	norm, unif, norm, norm	segmented	0.2781	12301.796	23754.82	2450826428	125.1309	-0.0038083	0.2230583	0.7880917	0.7961417	2.598367	0.8843667	34023.93	2.012933	0.7949000	0.399475
12	10	6	chebyshev , lorentzian , squared_euclidean, lorentzian	unif , cauchy, logis , t	sampled	0.2739	17371.732	25927.04	3345945952	130.6321	0.1233417	0.2987417	0.9742167	0.9952083	2.534117	0.8719083	45256.19	4.184283	0.9421250	0.398375
11	10	9	divergence, gower , divergence, avg	unif, norm, norm, norm	sampled	0.2734	13850.352	22349.83	2572308960	121.4817	0.1974583	0.3862917	1.1511417	1.1691583	1.832242	0.8707667	38639.23	5.269725	1.1161750	0.400675
9	10	3	clark , squared_euclidean, lorentzian , euclidean	logis, norm , norm , norm	segmented	0.2677	14583.322	26731.70	3077042706	137.4450	-0.0168500	0.2498917	0.8123667	0.8331167	0.907725	0.9657833	41656.50	2.706167	0.7991500	0.399800
2	10	7	squared_euclidean, gower , gower , jaccard	unif , cauchy, logis , cauchy	segmented	0.2630	10007.993	22191.76	2351898127	119.4387	0.1457667	0.3390417	1.0095583	1.0215750	2.700450	0.8889500	33666.91	3.490225	0.9977000	0.400625
29	10	3	clark , divergence, euclidean , manhattan	logis, norm , logis, t	sampled	0.2622	15796.921	32456.63	4906333232	158.6052	0.0310083	0.2834833	0.9147833	0.9340333	1.038992	1.1061583	57719.86	3.493758	0.8941833	0.391925
22	10	7	clark, avg , clark, clark	cauchy, cauchy, cauchy, cauchy	segmented	0.2602	8311.886	22337.80	2400231431	118.9371	0.0999000	0.3277667	0.9624250	0.9900500	3.015133	0.8636000	33836.45	2.692808	0.9061750	0.397750
3	10	3	manhattan , lorentzian, gower , manhattan	norm , unif , t , cauchy	sampled	0.2568	15622.599	29553.00	4258069367	147.2335	0.0552750	0.2650417	0.8884667	0.8856417	1.156475	1.0749750	50269.85	3.993958	0.9504750	0.391875
18	10	5	chebyshev , jaccard , euclidean , divergence	t , logis , norm , cauchy	sampled	0.2562	15154.460	25712.29	3198191852	129.1797	0.0251667	0.3067583	1.0052250	1.0474833	3.763325	0.9672917	38661.82	3.017925	0.9514333	0.391150
23	10	7	squared_euclidean, euclidean , avg , divergence	unif, unif, t , norm	sampled	0.2533	7467.247	23063.54	2615772067	125.6104	0.0173417	0.3421250	0.9611667	0.9802083	2.079517	0.9422583	38082.82	1.801075	0.9447750	0.397625
14	10	7	euclidean , manhattan , lorentzian , squared_euclidean	unif , t , cauchy, norm	sampled	0.2518	10086.010	25697.69	3265549914	136.3572	0.0296167	0.3087833	0.9456583	0.9535667	2.866708	0.9617750	43575.49	2.734808	0.9532917	0.392175
27	10	5	jaccard, jaccard, jaccard, jaccard	norm , logis, logis, unif	sampled	0.2484	13137.857	28353.72	3225147052	146.0960	0.0152500	0.3746167	1.1959500	1.2366917	4.392333	1.0980667	41700.49	1.545033	1.1141917	0.391175

For each time-feature you get a prediction set including min, max, q25, median, q75, selected ci, mean, sd, mode, skewness, kurtosis, and some other stuff.

Prediction for Daily Cases
	min	10%	25%	50%	75%	90%	max	mean	sd	mode	kurtosis	skewness	iqr_to_range	risk_ratio	upside_prob	divergence	entropy
2021-08-12	5426	24353	49122	57442	63024.00	88537	116669	56837.22	23754.19	57031.57	3.818	0.306	0.1250	1.1386	NA	NA	6.8119
2021-08-13	16757	26982	43028	55747	64010.50	93352	119833	56468.04	23692.99	53143.28	3.240	0.617	0.2036	1.6437	0.479	0.967	6.8201
2021-08-14	18827	33989	51325	65350	79478.25	92508	107359	63992.64	22187.94	70221.04	2.511	-0.112	0.3180	0.9030	0.576	0.968	6.8431
2021-08-15	14728	28533	44138	61499	85709.00	93700	101019	62822.70	23614.00	72850.62	2.053	-0.194	0.4818	0.8450	0.483	0.939	6.8308
2021-08-16	1471	26987	48598	73767	93032.00	101557	109806	69899.14	28609.81	86740.62	2.922	-0.748	0.4102	0.4985	0.579	0.966	6.7976
2021-08-17	0	39884	59310	70565	89105.00	96144	107529	67311.95	24794.93	65599.64	4.255	-1.014	0.2771	0.5238	0.497	0.971	6.8096
2021-08-18	42959	48356	60462	64764	73743.00	88481	92891	66227.01	12343.20	65050.28	2.938	0.428	0.2660	1.2899	0.445	0.939	6.8906
2021-08-19	769	24419	55871	70668	83738.00	102118	131961	68662.12	27682.93	65640.04	3.588	-0.411	0.2124	0.8769	0.522	0.935	6.8054
2021-08-20	19014	22031	51565	67262	82704.00	107148	137999	67157.24	29215.31	70339.14	3.028	0.305	0.2617	1.4661	0.451	0.967	6.8083
2021-08-21	23993	35491	64189	77238	92214.00	106561	109183	73498.89	23740.29	74166.01	2.582	-0.567	0.3290	0.6000	0.563	0.967	6.8490

Prediction for Daily Deaths
	min	10%	25%	50%	75%	90%	max	mean	sd	mode	kurtosis	skewness	iqr_to_range	risk_ratio	upside_prob	divergence	entropy
2021-08-12	43	124.0	177	198	273	344	488	223.259	86.213	208.404	4.088	0.732	0.2157	1.8710	NA	NA	6.8335
2021-08-13	0	59.0	170	280	372	430	722	280.409	148.491	298.314	4.292	0.601	0.2798	1.5786	0.619	0.940	6.7522
2021-08-14	0	89.0	151	290	383	455	527	274.164	140.235	273.794	2.111	0.044	0.4402	0.8172	0.450	0.967	6.7581
2021-08-15	0	90.0	179	339	408	457	555	305.297	140.056	384.788	2.211	-0.266	0.4126	0.6372	0.527	0.973	6.7813
2021-08-16	110	182.0	243	374	410	477	569	344.042	114.698	300.148	2.311	-0.164	0.3638	0.7386	0.570	0.968	6.8481
2021-08-17	116	195.9	245	293	336	374	546	291.050	81.015	312.012	4.389	0.393	0.2116	1.4294	0.363	0.930	6.8684
2021-08-18	121	160.0	236	296	323	412	606	298.485	104.728	283.904	4.348	1.004	0.1794	1.7714	0.530	0.932	6.8492
2021-08-19	66	210.0	242	325	390	534	660	331.791	126.857	312.846	3.271	0.478	0.2492	1.2934	0.588	0.963	6.8331
2021-08-20	76	189.0	301	350	441	523	685	367.136	133.115	374.786	3.251	0.105	0.2299	1.2226	0.557	0.969	6.8372
2021-08-21	94	213.0	282	395	507	596	847	398.656	155.658	381.836	3.336	0.544	0.2988	1.5017	0.587	0.965	6.8311

Prediction for Cumulative Cases
	min	10%	25%	50%	75%	90%	max	mean	sd	mode	kurtosis	skewness	iqr_to_range	risk_ratio	upside_prob	divergence	entropy
2021-08-12	35485645	35489335	35497105	35516929	35535973	35560301	35580609	35521811	27643.39	35499133	2.189	0.479	0.4093	2.0355	NA	NA	6.9078
2021-08-13	35533703	35548256	35551631	35580041	35605258	35633338	35646091	35582272	32777.50	35562246	1.893	0.461	0.4772	1.4254	0.925	0.970	6.9078
2021-08-14	35592139	35599710	35610262	35631171	35668750	35715812	35759069	35642998	41604.82	35617820	2.714	0.909	0.3504	3.2767	0.873	0.975	6.9078
2021-08-15	35618596	35657366	35678647	35702886	35761391	35813094	35865737	35717879	57035.21	35690092	2.480	0.731	0.3348	1.9320	0.853	0.937	6.9078
2021-08-16	35618596	35718536	35749159	35785851	35855134	35911810	35999818	35801578	71269.40	35775037	2.354	0.423	0.2780	1.2793	0.815	0.937	6.9078
2021-08-17	35670208	35786704	35825398	35881461	35944378	35988341	36211134	35888093	78394.55	35859400	2.723	0.288	0.2200	1.5606	0.800	0.956	6.9078
2021-08-18	35752851	35872288	35918153	35979196	36039878	36091364	36421700	35981841	91931.97	35983776	4.031	0.569	0.1820	1.9550	0.766	0.952	6.9078
2021-08-19	35825424	35954971	36008715	36070997	36151153	36222946	36449889	36082467	104734.84	36070271	3.207	0.497	0.2281	1.5429	0.765	0.969	6.9078
2021-08-20	35884445	36037363	36100740	36186983	36283382	36365146	37046644	36197637	134993.68	36164069	4.553	0.677	0.1572	2.8415	0.761	0.926	6.9077
2021-08-21	35894202	36127844	36204896	36296872	36405872	36507060	37046644	36311494	154204.39	36291665	3.742	0.613	0.1744	1.8620	0.714	0.941	6.9077

Prediction for Cumulative Deaths
	min	10%	25%	50%	75%	90%	max	mean	sd	mode	kurtosis	skewness	iqr_to_range	risk_ratio	upside_prob	divergence	entropy
2021-08-12	747107	747126.0	747166.0	747228.0	747307.0	747454.0	747609	747258.3	122.890	747207.9	3.852	1.134	0.2809	3.1488	NA	NA	6.9078
2021-08-13	747239	747290.0	747369.0	747452.0	747609.0	747772.0	747934	747498.2	181.640	747391.3	3.117	0.925	0.3453	2.2629	0.866	0.958	6.9078
2021-08-14	747293	747454.0	747532.8	747737.0	747911.0	748063.0	748494	747749.9	238.886	747646.9	2.659	0.476	0.3149	1.7050	0.794	0.954	6.9078
2021-08-15	747392	747710.0	747811.0	748027.0	748260.0	748390.0	749055	748041.2	277.810	747904.2	2.683	0.281	0.2700	1.6189	0.777	0.954	6.9078
2021-08-16	747489	748008.0	748179.0	748363.5	748600.2	748791.0	749638	748391.3	308.405	748428.0	2.938	0.234	0.1960	1.4574	0.809	0.916	6.9078
2021-08-17	747710	748285.9	748457.8	748698.5	748923.5	749160.1	749896	748707.9	344.804	748664.9	3.197	0.219	0.2131	1.2114	0.729	0.917	6.9078
2021-08-18	747936	748550.0	748777.5	749017.0	749294.5	749542.3	750288	749039.9	393.581	748988.4	3.074	0.282	0.2198	1.1758	0.729	0.876	6.9078
2021-08-19	748251	748859.4	749073.0	749357.0	749713.2	750014.3	751085	749401.6	453.568	749221.2	3.012	0.395	0.2259	1.5624	0.712	0.890	6.9078
2021-08-20	748320	749133.8	749424.8	749825.5	750218.0	750585.0	751998	749842.4	562.417	749779.9	3.106	0.331	0.2157	1.4430	0.729	0.888	6.9078
2021-08-21	748534	749503.9	749859.0	750314.0	750785.5	751253.2	752516	750345.4	670.457	750238.6	2.853	0.246	0.2327	1.2371	0.700	0.908	6.9078

In version 1.1, IQR to range, risk ratio, upside probability, divergence and entropy, have been added directly inside the prediction table and the terminal values (calculation at both ends of each sequence) have been dismissed: Just a couple of brief explanation here:

1. IQR to range: well, almost self explanatory, the normalization of IQR to min-max range allows for comparison among different time features;

2. risk ratio: here we mean the ratio between the range above median and the range below (in financial series allows you to understand how deep is the precipice even when the trend is going up);

3. upside probability: no brainer, probability of getting a larger value compared to the previous point, easy (an annotation here: in most cases the value is around 50%);

4. divergence: we dismissed the average Kullback-Leibler divergence for a simpler measure of divergence, quite similar to Chebyshev distance: in our humble case, the max distance between subsequent ecdf;

5. entropy: a last new entry from a specific package³ (it could be of interest to understand how entropy evolves in long-term forecasting and how entropy is related to good or bad predictions).

The only aim of all these “odd” stats is to better understand where aleatoric uncertainty ends, and where epistemic uncertainty begins.

For each time features included in the model, you get a plot of the median values with the chosen confidence interval (ci default is 0.8).

example$best$plot
  $daily_cases

  
  $daily_deaths

  
  $cumulative_cases

  
  $cumulative_deaths

Each model error is tested multiple times according to n_windows. The test_errors table includes the averaged values along all the tests (in version 1.1, we updated the metrics using a new package⁴).

Testing errors averaged along n_windows
	me	mae	mse	rmsse	mpe	mape	rmae	rrmse	rame	mase	smse	sce	gmrae	pred_score
daily_cases	9231.3266	20657.3955	6.983004e+08	220.45853	0.1995667	0.4724333	1.2839667	1.2697667	4.23550	1.6401333	55535.7380	7.464233	1.442633	0.5017
daily_deaths	217.1158	236.4803	8.024105e+04	16.86817	1.2566000	1.4056667	3.3287000	3.3603000	13.99993	0.9014000	305.7909	8.302200	3.191200	0.4991
cumulative_cases	73399.7667	88200.9325	1.434174e+10	347.58097	0.0021667	0.0026333	0.3805000	0.3828667	0.37350	1.0680667	175028.7794	8.841033	0.383700	0.3402
cumulative_deaths	256.5057	382.0029	2.359607e+05	11.12823	0.0003667	0.0004333	0.3594333	0.3614000	0.33770	0.2578667	159.1135	1.775167	0.363900	0.3349

This is the link to the useful philentropy package: https://cran.r-project.org/web/packages/philentropy/index.html ↩︎
imputeTS package may be found here: https://cran.r-project.org/web/packages/imputeTS/index.html ↩︎
We used the entropy package base options. For any information, you can look here: https://cran.r-project.org/web/packages/entropy/index.html ↩︎
In version 1.1 we decided to use the greybox package, well-focused on time series analysis, for the interesting option of choosing scale and benchmark of some error measures. The link: https://cran.r-project.org/web/packages/greybox/index.html ↩︎

Jenga: a brief introduction

Giancarlo Vercellino

17-April-2022

What is Jenga

An introductory example