For some of the competitions the data is available in the form of R packages:
| Field name | Description |
|---|---|
| sn | name of the series |
| st | Series number and period. For example “Y1” denotes first yearly series, “Q20” denotes 20th quarterly series and so on |
| n | The number of observations in the time series |
| h | The number of required forecasts |
| description | A short description of the time series |
| Field name | Description |
|---|---|
| period | Interval of the time series |
| type | The type of series |
| x | A time series of length n (the historical data) |
| xx | A time series of length h (the future data) |
A. Time series table schema (TSTS):
| Field name(column name) | Description | Example |
|---|---|---|
| series_id | Time series identifier - a unique name that identifies a time series | “Y1” |
| timestamp | Any representation of the period to which the observation relates | “01.01.1997” in case of daily data, Sep 1997” in case of monthly data, “Week 49, 1997” in case of weekly data |
| value | The value observed | 1000 |
B. Forecast dynamic table schemas (FDTS):
| series_id | method | timestamp | origin_timestamp | horizon | variable | value |
|---|---|---|---|---|---|---|
| Time series ID for which the forecast was calculated | Method identifier | Any representation of the period to which the forecast relates | Origin of the forecast (provided in a timestamp format) | Forecast horizon | The name of the variable that describes the forecasting result | The value of the variable |
C. Forecast Tables Schema (FTS):
| series_id | method | timestamp | origin_timestamp | horizon | forecast | Lo90 | Hi90 |
|---|---|---|---|---|---|---|---|
| … | … | … | … | … | … | … | … |
Time series table schema (TSTS):
TSTS <- read.csv("C:/Users/svcuo/Desktop/data/TSTS.csv")
TSTS$X <- NULL
(head(TSTS, 10))
series_id value timestamp Y1 3103.96 1984 Y1 3360.27 1985 Y1 3807.63 1986 Y1 4387.88 1987 Y1 4936.99 1988 Y1 5379.75 1989 Y1 6158.68 1990 Y1 6876.58 1991 Y2 5389.80 1984 Y2 5384.40 1985
Forecast dynamic table schemas (FDTS):
FDTS <- read.csv("C:/Users/svcuo/Desktop/data/FDTS.csv")
FDTS$X <- NULL
head(FDTS, 10)
series_id actual method timestamp origin_timestamp horizon variable value Y1 5379.75 A 1989 1988 1 forecast 5406.43 Y1 6158.68 A 1990 1988 2 forecast 5875.96 Y1 6876.58 A 1991 1988 3 forecast 6345.48 Y1 5379.75 B 1989 1988 1 forecast 5473.87 Y1 6158.68 B 1990 1988 2 forecast 6010.43 Y1 6876.58 B 1991 1988 3 forecast 6546.63 Y1 5379.75 C 1989 1988 1 forecast 5406.43 Y1 6158.68 C 1990 1988 2 forecast 5875.96 Y1 6876.58 C 1991 1988 3 forecast 6345.48 Y2 4793.20 A 1989 1988 1 forecast 4142.60
Forecast Tables Schema (FTS):
FTS <- read.csv("C:/Users/svcuo/Desktop/data/FTS.csv")
FTS$X <- NULL
head(FTS, 10)
series_id actual method timestamp origin_timestamp forecast horizon Lo90 Hi90 Y1 5379.75 A 1989 1988 5406.43 1 5183.349 5629.511 Y1 6158.68 A 1990 1988 5875.96 2 5652.879 6099.041 Y1 6876.58 A 1991 1988 6345.48 3 6122.399 6568.561 Y1 5379.75 B 1989 1988 5473.87 1 5250.789 5696.951 Y1 6158.68 B 1990 1988 6010.43 2 5787.349 6233.511 Y1 6876.58 B 1991 1988 6546.63 3 6323.549 6769.711 Y1 5379.75 C 1989 1988 5406.43 1 5183.349 5629.511 Y1 6158.68 C 1990 1988 5875.96 2 5652.879 6099.041 Y1 6876.58 C 1991 1988 6345.48 3 6122.399 6568.561 Y2 4793.20 A 1989 1988 4142.60 1 3919.519 4365.681
Prediction-Realization Diagram:
Fanchart:
horizon = 1 horizon = 2 horizon = 3 horizon = 4 horizon = 5 horizon = 6
NAIVE2 8.360053 19.23712 21.70531 23.45871 25.17578 27.35164
SINGLE 8.426719 19.53460 21.70985 23.59725 25.35748 27.93413
HOLT 8.504891 20.57738 26.74072 30.80756 34.94463 37.94606
DAMPEN 8.161127 19.23165 22.88949 26.32286 30.25410 31.27435
WINTER 8.504891 20.57738 26.74072 30.80756 34.94463 37.94606
COMB S-H-D 7.964892 19.02728 22.76000 25.56244 28.63649 30.24861
B-J auto 8.638050 19.71086 22.78263 26.77603 27.99026 30.82170
AutoBox1 10.119198 22.51186 27.07629 31.31042 34.37756 40.08493
AutoBox2 7.951192 18.21996 20.24227 21.65581 24.46921 27.17624
AutoBox3 10.698830 21.89010 25.29647 28.45540 29.57899 33.62135
ROBUST-Trend 7.606495 18.64720 22.39440 24.83567 27.61491 30.66538
ARARMA 9.091266 20.68177 25.10429 30.14883 34.99774 40.38033
Auto-ANN 8.956602 19.67521 21.76107 24.36152 26.41399 29.81788
Flors-Pearc1 8.561016 19.38149 22.80052 25.34184 27.62398 30.95579
Flors-Pearc2 10.903332 21.38609 23.17941 24.91399 27.72512 31.29920
PP-Autocast 8.141452 19.19054 22.75382 26.17481 30.09973 31.09496
ForecastPro 8.426093 18.77205 22.10483 25.87735 27.74920 30.45980
SMARTFCS 9.796722 20.29223 23.64564 25.85210 28.55908 31.99116
THETAsm 7.907310 18.26210 21.41826 23.33240 25.61775 27.89275
THETA 8.172273 19.38538 22.36993 25.85993 28.69015 31.01968
RBF 8.146542 18.86758 21.67786 22.58918 25.20706 26.92872
Having forecasting data stored in a well-defined way is crucial for monitoring and evaluating forecasting accuracy. In spite of the fact that a number of large-scale forecasting competitions have been conducted, at present there is no unified approach of how to store forecasting data. In this paper we aimed to present a data schema that is suitable for keeping forecasting data in a table as a part of a RDB or as a portable file.
We also showed how to implement various algorithms for accuracy evaluation based on the data structures proposed. We provided some examples in R, but, analogously, other existing languages (such as Python) can also be used to perform tasks such as data exploratory analysis and accuracy evaluation. Hopefully, the solutions presented will be flexible enough to be applied by academics and researchers and also by practitioners. One aim of the paper is to highlight the need of separating the forecasting data from the algorithms and tools for handling data (such as tools for viewing time series and forecasting results).