naive: a brief introduction

Giancarlo Vercellino

21-May-2022

Everything should be made as simple as possible, but not simpler.” (Einstein)

For every problem there is a solution which is simple, clean and wrong.” (Henry Louis Mencken)

From naive simplicity we arrive at more profound simplicity.” (Albert Schweitzer)

The simplest baseline

naive is the simplest possible model you can imagine to forecast a sequence of values: it selects the most common “patterns” of sequences to propose an empirical extrapolation for the future sequence. The idea behind naive is that of having a minimal benchmark for comparing most sophisticated models (both in term of computation time and testing error of prediction).

A brief overview on the process:

  1. Some basic transformation are directly managed in background. Differentiation and integration are automatically managed by naive using the maximal p-value in a recursive F-test for de-trending each time-feature: this allows to easily determine the different dynamic characteristics of each time feature, random walk, trend, exponential (somehow more simple and practical compared to other formal approaches like Augmented Dickey-Fuller or Ljung Box Test). If you have limited missing values in your time features, naive automatically proceeds with the imputation using the Kalman filter method1. If you prefer to project into the future the smoothed version, you can set smoother = TRUE to use loess2 function.

  2. After differentiation, each time features is reframed according to sequence length and stride (each time feature is segmented in sequences of seq_len shifted by stride), defining a data set of temporal segments that could be overlapping (when stride is less than seq_len) or not (when stride is equal or greater than seq_len). A distance metric is calculated among each sequence and the location parameter of distance matrix is calculated (mean, median or mode, diagonal excluded): the most recurring patterns are extracted selecting the sequences whom distances are inside a percentile cover built around the location parameter. The selected patterns are used as empirical distribution to simulate the possible future values and calculate the quantile prediction.

  3. The test errors are cross-validated through an expanding validation n_windows: the default value is set to 10, meaning that the time features are divided into 10 + 1 segments guaranteeing at least ten validation sets to measure the error on unforeseen data. For each point in the prediction sequence, a thousand samples are collected for the calculation of quantiles, mean, mode, standard deviation, skewness and kurtosis, and other less common measures (see below).

The process flow of naive

The process flow of naive

For example, let’s look at tech giants’ stocks …

The dataset time features included with naive is a recent take on some Big Techs’ stock prices (source: Yahoo Finance). The data is expected in a dataframe format, where each column represents a different time series (the date information is not mandatory and could be provided separately).

Examples of time features: Tech Giants Share
date IBM.Close AAPL.Close AMZN.Close GOOGL.Close MSFT.Close
2017-01-03 159.8375 29.0375 753.67 808.01 62.58
2017-01-04 161.8164 29.0050 757.18 807.77 62.30
2017-01-05 161.2811 29.1525 780.45 813.02 62.30
2017-01-06 162.0746 29.4775 795.99 825.21 62.84
2017-01-09 160.2773 29.7475 796.92 827.18 62.64
2017-01-10 158.2409 29.7775 795.90 826.01 62.62
2017-01-11 160.3728 29.9375 799.02 829.86 63.19
2017-01-12 160.5641 29.8125 813.64 829.53 62.61
2017-01-13 159.9809 29.7600 817.14 830.94 62.70
2017-01-17 160.5067 30.0000 809.72 827.46 62.53

In the first example, we are predicting the close price for Amazon and Google In this example we try to set seq_len = 50 (sequence length), using a cross-validation scheme of 10 n_windows for error measurement.


example1 <- naive(time_features[, 4:5], seq_len = 30, n_windows = 10,  dates = time_features$date)
  time: 89.5 sec elapsed

The result is a list of different components, as you can see below.

names(example1)
  [1] "exploration" "history"     "best_model"  "time_log"
names(example1$best_model)
  [1] "quant_preds" "plots"       "errors"

exploration takes in all models tested during the exploration.history includes selected parameters and error metrics for the explored space during random search (beside prediction score, me, mae, mse, rmsse, mpe, mape, rmae, rrmse, rame, mase, smse, sce, gmrae 3, averaged across features and validation windows). best_model collects a list of information for the best model selected according to the average error metric: you will find the prediction intervals (quant_preds), the visualizations (plots) and the testing error metric for each time feature (errors).

The quant_preds is a list including the predicted results for each time-feature (quantile, min, max, mean, mode, sd, skewness, kurtosis, iqr to range, median range ratio, upside probability, divergence for each time point in the seq_len sequence). The IQR to range is the interquartile range to the min-max range, the median range ratio is the range above median to the range below it, the upside probability is the probability of growth compared to the former point in the time sequence, the divergence is the maximum distance of cumulative normal curve of each point to the former point in the sequence.

Examples of prediction for GOOGLE Close Prices
min 10% 25% 50% 75% 90% max mean sd mode kurtosis skewness iqr_to_range median_range_ratio upside_prob divergence
2022-04-22 2312 2369.0 2387.00 2396.0 2403.00 2422.0 2599 2398.242 36.1526 2394.608 19.9504 3.3027 0.0557 2.4167 NA NA
2022-04-23 2259 2348.0 2374.00 2390.0 2401.00 2422.0 2599 2389.505 40.6684 2392.842 13.5310 2.0235 0.0794 1.5954 0.439 0.1009
2022-04-24 2259 2334.0 2365.00 2386.0 2398.00 2412.0 2599 2382.771 42.9731 2389.698 11.9165 1.7518 0.0971 1.6772 0.433 0.0676
2022-04-26 2181 2319.0 2352.00 2378.0 2394.00 2408.1 2599 2373.262 47.6292 2384.674 9.5104 0.9789 0.1005 1.1218 0.436 0.0922
2022-04-27 2179 2304.0 2342.00 2369.0 2389.00 2402.1 2597 2364.415 50.1594 2379.652 8.3632 0.7157 0.1124 1.2000 0.478 0.0749
2022-04-29 2063 2293.9 2329.00 2362.0 2384.00 2398.0 2597 2355.865 54.1692 2372.984 8.1068 0.3451 0.1030 0.7860 0.464 0.0716
2022-04-30 2063 2285.8 2322.75 2356.0 2379.25 2396.0 2597 2349.760 56.1086 2368.126 7.7924 0.3211 0.1058 0.8225 0.472 0.0462
2022-05-02 2063 2269.9 2311.00 2347.0 2374.00 2393.0 2597 2340.390 60.0182 2358.532 6.8190 0.1505 0.1180 0.8803 0.444 0.0693
2022-05-03 2063 2252.0 2298.00 2338.0 2367.00 2388.0 2597 2330.789 63.6116 2352.590 6.2816 0.0833 0.1292 0.9418 0.465 0.0659
2022-05-05 2063 2241.0 2290.00 2329.0 2361.00 2382.0 2593 2322.422 65.5858 2346.656 5.9728 0.0929 0.1340 0.9925 0.463 0.0530
2022-05-06 2061 2230.9 2280.00 2321.5 2354.25 2378.1 2593 2314.626 67.1350 2331.384 5.8596 0.0464 0.1396 1.0422 0.450 0.0477
2022-05-08 2000 2218.9 2270.75 2312.5 2347.00 2373.0 2589 2305.860 68.5272 2321.340 5.7721 0.0307 0.1295 0.8848 0.467 0.0522
2022-05-09 2000 2206.0 2261.00 2306.0 2342.25 2368.0 2587 2299.410 69.5498 2323.350 5.5168 0.0038 0.1384 0.9183 0.489 0.0377
2022-05-11 1992 2200.0 2253.00 2301.0 2338.00 2363.0 2587 2292.794 70.7851 2318.126 5.5252 -0.0004 0.1429 0.9256 0.457 0.0382
2022-05-12 1992 2192.8 2243.00 2289.5 2329.25 2360.0 2587 2284.182 72.9742 2304.134 5.0821 -0.0566 0.1450 1.0000 0.484 0.0492
2022-05-14 1966 2183.9 2237.00 2284.0 2324.00 2357.0 2587 2277.186 74.7673 2300.406 5.1059 -0.1172 0.1401 0.9528 0.486 0.0390
2022-05-15 1966 2173.0 2224.75 2273.0 2319.00 2349.1 2587 2267.879 77.2393 2281.216 4.8703 -0.1002 0.1518 1.0228 0.460 0.0505
2022-05-17 1919 2154.0 2213.75 2263.0 2309.00 2343.1 2587 2257.539 79.2551 2273.442 4.5546 -0.1452 0.1426 0.9419 0.468 0.0537
2022-05-18 1916 2146.9 2200.00 2253.5 2300.25 2339.0 2587 2248.577 81.0631 2266.092 4.4727 -0.1285 0.1494 0.9881 0.460 0.0455
2022-05-20 1916 2139.0 2195.00 2249.0 2294.00 2332.0 2587 2242.412 81.6024 2258.164 4.4830 -0.1856 0.1475 1.0150 0.482 0.0303
2022-05-21 1916 2121.9 2184.00 2238.5 2287.00 2327.1 2587 2232.815 84.2189 2254.274 4.2418 -0.1886 0.1535 1.0806 0.465 0.0478
2022-05-23 1916 2103.0 2167.00 2227.0 2279.00 2321.0 2587 2220.778 88.0392 2242.648 3.9936 -0.1916 0.1669 1.1576 0.491 0.0583
2022-05-24 1905 2094.0 2158.00 2219.5 2272.00 2316.0 2587 2212.660 89.1097 2237.640 3.8126 -0.1750 0.1672 1.1685 0.476 0.0369
2022-05-26 1902 2085.0 2149.50 2214.0 2266.00 2310.0 2587 2205.269 89.6083 2235.430 3.8217 -0.1632 0.1701 1.1955 0.479 0.0331
2022-05-27 1880 2073.0 2139.00 2206.5 2257.25 2301.0 2587 2196.312 92.0427 2215.920 3.7777 -0.1699 0.1673 1.1654 0.483 0.0407
2022-05-29 1880 2070.0 2130.75 2198.0 2252.00 2296.2 2587 2189.828 92.8848 2218.566 3.6961 -0.1307 0.1715 1.2233 0.460 0.0282
2022-05-30 1871 2057.8 2124.00 2190.0 2247.00 2291.1 2587 2182.532 94.0993 2202.796 3.6965 -0.1193 0.1718 1.2445 0.497 0.0315
2022-06-01 1854 2043.8 2111.00 2179.0 2237.25 2283.1 2587 2172.364 95.9514 2199.506 3.6927 -0.1266 0.1722 1.2554 0.455 0.0434
2022-06-02 1816 2036.9 2104.00 2171.0 2231.00 2276.0 2587 2164.854 97.5723 2194.736 3.7860 -0.1345 0.1647 1.1718 0.488 0.0317
2022-06-04 1805 2023.0 2093.00 2162.0 2219.00 2265.0 2587 2153.757 99.5639 2181.820 3.8943 -0.1543 0.1611 1.1905 0.450 0.0456

For each time features included in the model, you get a plot of the median with the chosen confidence interval (ci default is 0.8). As in other packages4, we provide different stats to give a better hint on the different dynamics related to aleatoric and epistemic uncertainty.

  $AMZN.Close

  
  $GOOGL.Close

An exploration of the hyper-parameter space

The hyper-parameter space defined by seq_len, method, location and cover is kind of huge. Now, let’s try a random search for the best parameter settings. The following example shows how to sample 100 different models for a sequence of 30 time steps.

example2 <- naive(time_features[, 4:5], seq_len = 30, n_samp = 100, n_windows = 10, dates = time_features$date)
  time: 316.6 sec elapsed
History table with ranking of 100 different models
seq_len cover stride method location pred_score me mae mse rmsse mpe mape rmae rrmse rame mase smse sce gmrae
82 30 0.3627 13 minimum median 0.1176 135.9563 146.4151 36363.95 45.3967 0.0754 0.0814 2.4688 2.4440 9.8024 10.6473 2305.583 300.1828 2.3354
27 30 0.8976 13 minimum median 0.1167 136.2575 146.6415 36447.93 45.4417 0.0756 0.0814 2.4746 2.4484 9.8884 10.6551 2310.521 300.7124 2.3502
93 30 0.2978 13 bhattacharyya median 0.1156 135.4350 145.9678 36128.14 45.2944 0.0753 0.0812 2.4579 2.4359 9.8552 10.6192 2292.120 299.2818 2.3198
67 30 0.3971 2 bhattacharyya mode 0.1139 133.7792 144.9582 35695.15 44.8745 0.0740 0.0803 2.4310 2.4064 9.4554 10.5261 2262.969 295.0029 2.3033
34 30 0.7975 10 minkowski median 0.1136 133.1825 143.9254 35473.83 44.5923 0.0737 0.0799 2.4135 2.3915 9.4922 10.4305 2232.926 292.8412 2.2649
92 30 0.6854 17 manhattan mean 0.1135 134.2135 144.7527 35745.19 44.8967 0.0745 0.0804 2.4300 2.4122 9.8994 10.5091 2261.106 295.7342 2.2790
48 30 0.8856 17 jensen_shannon mean 0.1134 135.1137 145.7697 36105.12 45.2138 0.0748 0.0809 2.4525 2.4337 10.0112 10.5831 2284.964 297.9181 2.3070
26 30 0.8455 1 minkowski mean 0.1134 134.4618 145.4196 35771.52 45.0310 0.0744 0.0807 2.4494 2.4240 9.6353 10.5680 2266.894 296.7357 2.3173
58 30 0.8872 10 jensen_shannon mode 0.1132 134.3650 145.0983 35743.99 44.9361 0.0743 0.0803 2.4501 2.4237 9.5994 10.5206 2256.370 295.5453 2.3238
2 30 0.8976 5 maximum mode 0.1131 134.2458 145.1689 35974.80 44.8898 0.0738 0.0802 2.4321 2.4085 9.5758 10.5040 2264.191 294.7563 2.2982
37 30 0.4844 17 manhattan mean 0.1130 127.8763 139.0565 32610.98 43.4224 0.0718 0.0782 2.3396 2.3236 9.5364 10.2348 2112.513 286.4325 2.1902
50 30 0.5965 2 jensen_shannon median 0.1128 134.2330 145.2163 35659.76 45.0061 0.0744 0.0806 2.4542 2.4298 9.6182 10.5474 2259.035 295.9662 2.2994
15 30 0.5805 19 maximum median 0.1128 131.9635 143.3606 34957.72 44.3704 0.0727 0.0792 2.4149 2.3922 9.7234 10.3915 2213.092 290.3270 2.3244
16 30 0.2313 16 jensen_shannon median 0.1127 134.8424 145.7357 36156.25 45.1162 0.0745 0.0807 2.4274 2.4063 9.5478 10.5931 2293.582 297.7759 2.2841
73 30 0.6582 17 maximum mean 0.1126 134.5706 145.1971 35759.61 45.1072 0.0747 0.0806 2.4478 2.4308 9.9659 10.5572 2270.445 297.1437 2.2975
79 30 0.1817 5 maximum mean 0.1125 127.3835 139.0384 32893.46 43.2785 0.0710 0.0777 2.3338 2.3122 9.2121 10.1917 2110.181 284.1158 2.2011
72 30 0.7959 17 jensen_shannon mode 0.1123 134.4455 145.2099 35767.29 45.0619 0.0745 0.0806 2.4426 2.4247 9.7527 10.5524 2266.938 296.7464 2.2808
98 30 0.1649 1 minimum mode 0.1123 131.3441 142.3096 34146.60 44.2251 0.0731 0.0795 2.3853 2.3602 9.4146 10.4156 2190.982 292.0144 2.2406
87 30 0.4924 4 minimum median 0.1123 134.3477 145.3388 35891.70 45.0473 0.0744 0.0806 2.4413 2.4190 9.4336 10.5662 2274.899 296.5515 2.3052
62 30 0.8375 2 maximum mean 0.1121 135.3867 146.2512 36190.23 45.2758 0.0751 0.0812 2.4669 2.4414 9.6188 10.6128 2288.168 298.1770 2.3109
96 30 0.6037 1 canberra1 mode 0.1121 135.4234 146.4813 36439.15 45.2818 0.0747 0.0809 2.4666 2.4400 9.5890 10.6090 2296.974 297.6325 2.3354
46 30 0.3330 17 minimum mode 0.1120 134.6943 145.1731 35734.15 45.0663 0.0746 0.0805 2.4477 2.4297 10.0462 10.5426 2263.614 296.9290 2.3121
99 30 0.8359 2 jensen_shannon median 0.1119 134.6419 145.4952 35936.26 45.0350 0.0745 0.0807 2.4432 2.4188 9.4262 10.5614 2274.328 296.5750 2.2962
71 30 0.7158 10 jensen_shannon mean 0.1119 134.9617 145.6226 35973.62 45.0739 0.0745 0.0805 2.4603 2.4339 9.7124 10.5486 2268.644 296.5633 2.3495
78 30 0.8560 1 kullback_leibler mean 0.1118 135.0255 146.0189 36157.13 45.1892 0.0747 0.0811 2.4609 2.4352 9.5308 10.5958 2286.952 297.3935 2.3316
18 30 0.5252 17 jensen_shannon mean 0.1117 135.0746 145.6853 35875.60 45.1971 0.0749 0.0810 2.4552 2.4374 9.8950 10.5820 2274.482 297.8978 2.3015
100 30 0.8047 8 minkowski median 0.1116 133.9162 144.9855 35849.09 44.8657 0.0741 0.0805 2.4188 2.3956 9.3968 10.5364 2268.735 295.5204 2.2878
8 30 0.6069 4 bhattacharyya mode 0.1116 135.0376 145.9415 36026.74 45.2042 0.0748 0.0810 2.4539 2.4296 9.4592 10.6125 2285.709 298.0416 2.3258
61 30 0.2954 1 maximum median 0.1114 130.8272 142.3956 34611.57 44.1336 0.0722 0.0789 2.3788 2.3584 9.4052 10.3694 2196.589 289.4805 2.2356
70 30 0.4235 5 canberra1 median 0.1114 134.4999 145.4162 35875.43 44.9972 0.0743 0.0804 2.4510 2.4255 9.5046 10.5390 2267.466 295.8257 2.3243
53 30 0.5124 6 jensen_shannon median 0.1114 135.3996 146.4460 36193.30 45.3414 0.0751 0.0813 2.4707 2.4437 9.6626 10.6504 2296.205 299.1325 2.3296
85 30 0.8840 17 euclidean mode 0.1112 126.8314 138.3389 32319.81 43.2181 0.0711 0.0778 2.3260 2.3088 9.2310 10.1879 2095.643 284.5070 2.1843
33 30 0.7879 10 jensen_shannon mean 0.1112 134.9808 145.6351 36010.73 45.0610 0.0746 0.0807 2.4596 2.4323 9.7015 10.5472 2269.219 296.4922 2.3218
94 30 0.3595 17 minkowski mean 0.1112 124.3290 135.9015 31242.13 42.5106 0.0698 0.0763 2.2786 2.2649 9.1348 10.0337 2032.975 279.7829 2.1430
10 30 0.2017 10 jensen_shannon median 0.1110 135.3778 145.9996 36272.79 45.1488 0.0746 0.0808 2.4577 2.4312 9.5820 10.5617 2282.016 296.8976 2.3402
4 30 0.2217 8 canberra1 median 0.1109 135.2561 146.3118 36271.98 45.2336 0.0748 0.0810 2.4586 2.4316 9.4779 10.6135 2294.230 297.8154 2.3224
41 30 0.3386 6 minimum mode 0.1109 134.9547 146.1150 36081.86 45.2015 0.0747 0.0811 2.4624 2.4353 9.5804 10.6152 2286.082 297.7167 2.3368
45 30 0.6181 16 minkowski median 0.1107 131.3382 142.8209 34734.05 44.3518 0.0733 0.0797 2.3826 2.3614 9.4232 10.4557 2219.485 292.6693 2.2377
12 30 0.1184 13 jensen_shannon mode 0.1106 132.1042 143.4229 34787.81 44.5472 0.0735 0.0801 2.4283 2.4034 9.6261 10.4834 2221.971 293.7161 2.2856
23 30 0.3370 16 jensen_shannon median 0.1106 136.5036 147.3980 36685.07 45.6019 0.0756 0.0818 2.4700 2.4437 9.7102 10.7218 2327.403 301.7155 2.3184
1 30 0.5484 18 canberra1 median 0.1105 134.6698 146.4322 36397.29 45.2828 0.0744 0.0812 2.4560 2.4301 9.3095 10.6284 2307.402 297.3109 2.3235
97 30 0.6133 17 euclidean mean 0.1104 130.0838 141.2379 33861.09 43.9032 0.0723 0.0788 2.3740 2.3576 9.5876 10.3040 2158.410 288.4696 2.2128
54 30 0.2802 6 canberra1 mode 0.1104 134.8900 145.9643 35978.72 45.1777 0.0748 0.0810 2.4578 2.4306 9.6082 10.6183 2283.052 297.9936 2.3231
80 30 0.2818 3 kullback_leibler mean 0.1104 134.6300 145.7845 36059.20 45.0810 0.0744 0.0808 2.4552 2.4280 9.4526 10.5751 2277.861 296.3767 2.3396
51 30 0.3779 6 bhattacharyya mean 0.1102 135.9112 147.1401 36563.43 45.4628 0.0752 0.0816 2.4825 2.4523 9.5603 10.6811 2312.669 299.5022 2.3657
49 30 0.2161 18 minkowski median 0.1101 118.7966 131.7988 29273.44 41.3803 0.0673 0.0746 2.2035 2.1879 7.9332 9.8252 1942.136 270.9560 2.0669
63 30 0.6005 2 minkowski median 0.1097 129.4861 141.1656 34119.47 43.7884 0.0718 0.0784 2.3632 2.3421 9.2284 10.2890 2162.279 287.0399 2.2228
66 30 0.2033 2 manhattan mean 0.1096 124.1231 136.0273 31209.00 42.4174 0.0695 0.0762 2.2844 2.2638 8.6550 10.0238 2022.228 278.5039 2.1561
20 30 0.8055 9 kullback_leibler mode 0.1096 134.8628 146.4201 36243.44 45.2559 0.0746 0.0812 2.4657 2.4368 9.4066 10.6322 2294.359 297.2637 2.3492
11 30 0.3418 18 jensen_shannon mode 0.1095 132.5239 144.4656 35551.92 44.7246 0.0734 0.0802 2.4150 2.3940 9.2446 10.4964 2257.021 293.0615 2.2862
17 30 0.5973 22 minimum mode 0.1095 141.5012 150.8588 38203.51 46.5567 0.0782 0.0836 2.5544 2.5186 10.1851 10.9370 2412.425 310.9208 2.4665
52 30 0.2570 22 canberra1 mode 0.1093 141.7500 151.2595 38686.59 46.6724 0.0781 0.0837 2.5495 2.5170 10.0695 10.9398 2438.078 310.7661 2.4473
75 30 0.1312 3 canberra1 median 0.1092 135.0946 146.1694 36026.36 45.2003 0.0747 0.0810 2.4674 2.4372 9.5655 10.6206 2283.976 297.9552 2.3528
29 30 0.5965 5 minkowski mode 0.1091 124.1491 136.1535 31512.33 42.4124 0.0692 0.0761 2.2808 2.2598 8.7510 10.0004 2030.409 277.4643 2.1487
91 30 0.3851 9 maximum mean 0.1091 134.4238 145.9933 36045.46 45.0978 0.0741 0.0809 2.4516 2.4198 9.3769 10.6034 2283.409 296.4262 2.3485
25 30 0.3258 6 minkowski mean 0.1090 125.7110 137.7352 32040.75 42.8598 0.0703 0.0772 2.3146 2.2910 9.0305 10.1367 2069.154 281.7223 2.1838
74 30 0.6197 5 manhattan median 0.1090 128.2259 140.0396 33525.03 43.4762 0.0710 0.0779 2.3466 2.3274 9.2313 10.2126 2129.947 284.3884 2.2073
36 30 0.3507 19 euclidean mode 0.1090 114.1260 127.4087 27481.36 40.1182 0.0650 0.0723 2.1340 2.1176 8.5330 9.5442 1839.741 261.7064 2.0385
38 30 0.3378 14 bhattacharyya median 0.1089 135.3790 147.0794 36621.21 45.4672 0.0751 0.0817 2.4708 2.4418 9.5036 10.6954 2321.066 299.0361 2.3694
30 30 0.5132 9 maximum median 0.1089 133.6645 145.3503 35651.71 44.9634 0.0740 0.0807 2.4446 2.4148 9.4444 10.5792 2267.256 295.5030 2.3316
57 30 0.3835 2 minkowski median 0.1088 125.3553 137.2441 31529.18 42.7644 0.0700 0.0769 2.3144 2.2934 9.1666 10.1084 2042.740 281.1055 2.1978
35 30 0.3066 1 euclidean mean 0.1087 126.0470 137.9348 32296.02 42.9020 0.0703 0.0771 2.3102 2.2884 8.9193 10.1278 2076.881 281.5236 2.1883
44 30 0.3507 5 maximum median 0.1087 129.8376 141.3021 34087.03 43.8246 0.0718 0.0784 2.3636 2.3404 9.3612 10.3064 2168.325 287.7921 2.2367
59 30 0.6814 12 minkowski mean 0.1087 133.5842 145.1508 35984.04 44.9137 0.0738 0.0804 2.4231 2.3998 9.3578 10.5448 2273.419 294.7323 2.3035
60 30 0.8560 9 euclidean mean 0.1086 134.9346 146.5317 36322.85 45.2566 0.0746 0.0813 2.4629 2.4326 9.3365 10.6341 2298.550 297.2564 2.3649
5 30 0.1585 8 maximum mean 0.1086 127.5504 139.1678 32573.89 43.2514 0.0713 0.0779 2.3346 2.3091 9.0656 10.2270 2100.776 285.2037 2.1955
13 30 0.7711 21 canberra1 median 0.1085 134.2207 146.0625 36245.03 45.1099 0.0741 0.0810 2.4505 2.4235 9.5929 10.5870 2285.282 294.8591 2.3420
28 30 0.8928 14 bhattacharyya mode 0.1084 134.8567 146.4420 36197.46 45.3230 0.0749 0.0814 2.4646 2.4367 9.4640 10.6615 2301.431 298.2230 2.3750
21 30 0.8031 21 euclidean median 0.1084 132.9977 144.9425 35830.19 44.7726 0.0735 0.0806 2.4216 2.3973 9.3791 10.5130 2259.742 292.5273 2.3074
64 30 0.3090 11 jensen_shannon mean 0.1084 139.7928 149.5234 37572.80 46.1613 0.0774 0.0830 2.5288 2.4946 10.0602 10.8447 2374.697 307.4434 2.4420
24 30 0.5805 8 minkowski median 0.1083 128.8408 140.5684 33521.07 43.6479 0.0718 0.0784 2.3569 2.3349 9.1366 10.2882 2142.680 286.8858 2.2093
22 30 0.8191 11 minkowski mean 0.1080 139.8112 149.6610 37725.00 46.2274 0.0774 0.0830 2.5300 2.4976 10.0247 10.8592 2385.225 307.6527 2.4303
3 30 0.3563 14 euclidean mode 0.1079 117.1689 130.6726 29357.36 41.0752 0.0665 0.0741 2.1736 2.1490 7.4737 9.7941 1945.480 268.7716 2.0892
7 30 0.2161 4 minkowski mean 0.1079 124.1419 136.1154 31040.43 42.4856 0.0697 0.0766 2.2908 2.2699 8.8438 10.0637 2023.761 279.6485 2.1631
31 30 0.2690 16 maximum median 0.1077 127.6972 139.5232 32938.89 43.4049 0.0713 0.0782 2.3270 2.3050 9.2084 10.2710 2127.363 286.2577 2.1978
6 30 0.2818 21 kullback_leibler median 0.1074 133.4495 145.2760 35887.29 44.8993 0.0736 0.0806 2.4290 2.4058 9.3908 10.5320 2264.910 293.2331 2.3198
32 30 0.8439 11 manhattan median 0.1074 138.8773 148.7082 37401.46 45.9050 0.0768 0.0824 2.4978 2.4662 9.8152 10.7841 2359.268 305.3974 2.3991
77 30 0.1256 14 minimum mode 0.1071 133.5458 145.2642 35732.33 44.8954 0.0739 0.0806 2.4456 2.4166 9.3441 10.5402 2264.905 294.2998 2.3540
83 30 0.6349 11 maximum median 0.1071 139.3713 149.1376 37514.32 45.9733 0.0769 0.0824 2.5117 2.4775 10.0982 10.8059 2363.014 306.1941 2.4315
89 30 0.6862 11 minkowski mean 0.1065 138.8414 148.5883 37331.62 45.8286 0.0769 0.0825 2.5002 2.4665 9.8653 10.7688 2352.560 305.0984 2.4133
14 30 0.3843 22 manhattan mean 0.1065 132.6796 142.5832 34395.17 44.1229 0.0741 0.0796 2.3941 2.3620 9.4294 10.4138 2192.272 294.2344 2.3218
47 30 0.7687 22 manhattan mode 0.1060 131.3013 141.5186 33502.82 44.0758 0.0739 0.0798 2.3848 2.3557 9.4406 10.4469 2179.410 294.7557 2.2826
68 30 0.1016 10 bhattacharyya mode 0.1056 129.5518 141.1246 34057.91 43.7621 0.0719 0.0786 2.3698 2.3459 9.5304 10.2997 2169.723 287.2631 2.2689
76 30 0.5132 14 manhattan median 0.1056 125.1600 137.7180 31864.72 42.8780 0.0702 0.0775 2.3198 2.2936 8.7753 10.1544 2065.204 281.1245 2.2101
65 30 0.4115 14 kullback_leibler mode 0.1055 134.2611 146.0308 36045.79 45.2002 0.0748 0.0815 2.4542 2.4246 9.4565 10.6498 2296.058 297.4744 2.3585
69 30 0.3987 22 maximum mean 0.1054 136.6511 146.3606 36205.41 45.1600 0.0758 0.0812 2.4602 2.4260 9.7829 10.6286 2288.416 300.9779 2.3917
55 30 0.5028 20 maximum mean 0.1052 136.2165 147.0977 36597.00 45.3308 0.0751 0.0814 2.4745 2.4381 9.6078 10.6745 2305.985 299.6266 2.3822
95 30 0.8904 20 manhattan median 0.1051 136.1019 147.0967 36612.20 45.3750 0.0753 0.0816 2.4732 2.4365 9.4894 10.6947 2318.346 299.9388 2.3726
9 30 0.1384 7 bhattacharyya mode 0.1048 129.6559 141.8586 34499.66 43.8925 0.0718 0.0789 2.3712 2.3480 9.1860 10.3146 2184.573 286.5098 2.2463
19 30 0.4275 12 minkowski median 0.1047 123.9351 136.3320 31102.90 42.4647 0.0693 0.0766 2.2943 2.2702 8.8913 10.0592 2021.238 278.6496 2.1915
40 30 0.2257 15 minimum mode 0.1046 131.6340 144.1205 35138.36 44.6320 0.0728 0.0801 2.4280 2.3998 9.1925 10.5027 2232.206 291.4605 2.3286
42 30 0.8287 15 bhattacharyya median 0.1046 132.4812 144.6578 35404.05 44.8009 0.0734 0.0804 2.4388 2.4113 9.2931 10.5331 2248.846 292.8452 2.3400
56 30 0.8568 15 minimum median 0.1041 132.1232 144.4772 35480.58 44.6800 0.0730 0.0801 2.4281 2.4002 9.1794 10.4963 2242.028 291.3914 2.3356
90 30 0.1120 17 bhattacharyya mode 0.1038 128.3248 140.4523 33804.34 43.6106 0.0711 0.0781 2.3521 2.3329 9.1537 10.2638 2151.785 285.0004 2.2044
86 30 0.7214 20 euclidean mode 0.1032 123.7912 136.0037 30770.05 42.4490 0.0699 0.0770 2.3005 2.2672 8.3907 10.1166 2025.445 280.4462 2.1975
43 30 0.4243 11 euclidean mean 0.1032 131.9922 142.4048 34182.61 44.1175 0.0736 0.0794 2.4000 2.3697 9.5699 10.4214 2185.586 293.5018 2.3003
84 30 0.1601 20 minimum mode 0.1030 136.5197 147.3576 36393.12 45.5026 0.0754 0.0819 2.4994 2.4600 9.5820 10.7375 2303.994 301.5211 2.3955
81 30 0.1865 22 minkowski mode 0.1022 121.9391 132.5114 29633.49 41.2482 0.0680 0.0742 2.2054 2.1779 8.2214 9.7630 1928.981 273.2001 2.1153
88 30 0.5372 20 minkowski median 0.1016 127.8345 139.8420 32798.77 43.3876 0.0713 0.0784 2.3554 2.3243 9.0295 10.2826 2112.349 285.7437 2.2556
39 30 0.1184 14 manhattan median 0.1011 115.3293 128.8623 27810.27 40.4749 0.0656 0.0733 2.1618 2.1397 8.0196 9.6478 1855.701 264.1898 2.0811

If we compare the error statistics from the best model in example2 with the model in example1, for Amazon and Google we see consistent improvement. All the relative and scaled error metrics defaults to naive, but you can choose more challenging thresholds (like the deviation of the whole time feature or the average of the whole predicted sequence).

The error statistics from example1 (averaged across 10 expanding validation windows):

example1$best_model$errors
              pred_score       me      mae      mse   rmsse    mpe   mape   rmae
  AMZN.Close      0.1338 169.8016 185.0265 55874.76 53.3262 0.0794 0.0877 2.2428
  GOOGL.Close     0.0992 101.4664 107.1904 16207.96 37.4059 0.0716 0.0751 2.6984
               rrmse    rame    mase     smse      sce  gmrae
  AMZN.Close  2.2683  4.1215 11.3708 3154.579 316.5591 2.0452
  GOOGL.Close 2.6257 15.7802  9.9103 1432.759 283.6309 2.6248

The error statistics from example2 (as above, averaged across 10 expanding validation windows):

example2$best_model$errors
              pred_score       me      mae      mse   rmsse    mpe   mape   rmae
  AMZN.Close      0.1357 170.7348 185.8427 56560.19 53.4571 0.0796 0.0879 2.2483
  GOOGL.Close     0.0995 101.1778 106.9875 16167.70 37.3364 0.0712 0.0748 2.6893
               rrmse    rame    mase     smse      sce  gmrae
  AMZN.Close  2.2717  4.1114 11.3971 3181.336 317.4391 2.0492
  GOOGL.Close 2.6163 15.4933  9.8975 1429.830 282.9264 2.6217

The improvement is clear for both the time features, but we are still using a naive approach to measure scaled and relative errors. Let’s try to shift to deviation as scale, and average as benchmark, that are more challenging evaluation criteria.

example3 <- naive(time_features[, 4:5], seq_len = 30, n_windows = 10, dates = time_features$date, error_scale = "deviation", error_benchmark = "average")
  time: 88.72 sec elapsed

As you can see, the relative and scaled measures change sensibly as we raise the bar of our expectations:

example3$best_model$errors
              pred_score       me      mae      mse   rmsse    mpe   mape   rmae
  AMZN.Close      0.1338 169.8016 185.0265 55874.76 11.9350 0.0794 0.0877 1.0797
  GOOGL.Close     0.0992 101.4664 107.1904 16207.96 10.3578 0.0716 0.0751 1.0544
               rrmse rame   mase     smse     sce  gmrae
  AMZN.Close  1.1810    1 0.6396 165.4390 18.1701 0.8392
  GOOGL.Close 1.1393    1 0.7990 112.7934 23.0830 0.8762

Some useful references


  1. The missing imputation is managed through imputeTS package. For any information: https://cran.r-project.org/web/packages/imputeTS/index.html.↩︎

  2. In some cases, maybe you want to operate on smoothed time-features. In this case, naive calls on fANCOVA package. Here you can find all the latest: https://cran.r-project.org/web/packages/fANCOVA/index.html↩︎

  3. The metrics are calculated using the greybox package. For any reference, please take a look here: https://cran.r-project.org/web/packages/greybox/index.html↩︎

  4. Other packages focused on time feature analysis that could be of interest here:

    - AUDREX, https://cran.r-project.org/web/packages/audrex/index.html
    - PROTEUS, https://cran.r-project.org/web/packages/proteus/index.html
    - JENGA, https://cran.r-project.org/web/packages/jenga/index.html
    - TETRAGON, https://cran.r-project.org/web/packages/tetragon/index.html
    - SPOOKY, https://cran.r-project.org/web/packages/spooky/index.html
    - DYMO, https://cran.r-project.org/web/packages/dymo/index.html
    ↩︎