Data Schemas for Forecasting (with examples in R)

style type=“text/css”> .illustration { border: solid 1px #cccccc; } small { font-size: 0.6em; }

Overview

Existing approaches used in forecasting competitions?

A number of well-known forecating competitions: M1, M2, M3, M4, and others

For some of the competitions the data is available in the form of R packages:

Mcomp: Data from the M-competition and M3-competition
M4comp2018: Data from the M4-competition
Tcomp: Data from the Kaggle tourism competition
tscompdata: Data from the NN3 and NN5 competitions

The approach to store forecasting data of the above packages

Time series are provided as a list of objects. Each series within this list is of class Mdata withh the following structure:

Field name	Description
sn	name of the series
st	Series number and period. For example “Y1” denotes first yearly series, “Q20” denotes 20th quarterly series and so on
n	The number of observations in the time series
h	The number of required forecasts
description	A short description of the time series

The approach to store forecasting data of the above packages

Field name	Description
period	Interval of the time series
type	The type of series
x	A time series of length n (the historical data)
xx	A time series of length h (the future data)

Forecast are provided as a list of dataframes. Each list element is the result of one forecasting method. The dataframe then has the following structure: Each row is the forecast of one series. Rows are named accordingly. In total there are 18 columns, i.e., 18 forecasts. If fewer forecasts than 18 exist, the row is filled up with NA values.

New data schemas for forecasting tasks

A. Time series table schema (TSTS):

Field name(column name)	Description	Example
series_id	Time series identifier - a unique name that identifies a time series	“Y1”
timestamp	Any representation of the period to which the observation relates	“01.01.1997” in case of daily data, Sep 1997” in case of monthly data, “Week 49, 1997” in case of weekly data
value	The value observed	1000

New data schemas for forecasting tasks

B. Forecast dynamic table schemas (FDTS):

series_id	method	timestamp	origin_timestamp	horizon	variable	value
Time series ID for which the forecast was calculated	Method identifier	Any representation of the period to which the forecast relates	Origin of the forecast (provided in a timestamp format)	Forecast horizon	The name of the variable that describes the forecasting result	The value of the variable

New data schemas for forecasting tasks

C. Forecast Tables Schema (FTS):

series_id	method	timestamp	origin_timestamp	horizon	forecast	Lo90	Hi90
…	…	…	…	…	…	…	…

Examples:

Time series table schema (TSTS):

TSTS <- read.csv("C:/Users/svcuo/Desktop/data/TSTS.csv")
TSTS$X <- NULL 
(head(TSTS, 10))

series_id   value    timestamp
  Y1        3103.96     1984    
  Y1        3360.27     1985    
  Y1        3807.63     1986    
  Y1        4387.88     1987    
  Y1        4936.99     1988    
  Y1        5379.75     1989    
  Y1        6158.68     1990    
  Y1        6876.58     1991    
  Y2        5389.80     1984    
  Y2        5384.40     1985

Examples:

Forecast dynamic table schemas (FDTS):

FDTS <- read.csv("C:/Users/svcuo/Desktop/data/FDTS.csv")
FDTS$X <- NULL
head(FDTS, 10)

series_id   actual    method  timestamp   origin_timestamp  horizon    variable  value
  Y1        5379.75       A       1989            1988          1         forecast  5406.43
  Y1        6158.68       A       1990            1988          2         forecast  5875.96
  Y1        6876.58       A       1991            1988          3         forecast  6345.48
  Y1        5379.75       B       1989            1988          1         forecast  5473.87
  Y1        6158.68       B       1990            1988          2         forecast  6010.43
  Y1        6876.58       B       1991            1988          3         forecast  6546.63
  Y1        5379.75       C       1989            1988          1         forecast  5406.43
  Y1        6158.68       C       1990            1988          2         forecast  5875.96
  Y1        6876.58       C       1991            1988          3         forecast  6345.48
  Y2        4793.20       A       1989            1988          1         forecast  4142.60

Examples:

Forecast Tables Schema (FTS):

FTS <- read.csv("C:/Users/svcuo/Desktop/data/FTS.csv")
FTS$X <- NULL
head(FTS, 10)

series_id  actual  method timestamp origin_timestamp forecast  horizon    Lo90    Hi90
  Y1       5379.75    A       1989        1988        5406.43       1   5183.349 5629.511
  Y1       6158.68    A       1990        1988        5875.96       2   5652.879 6099.041
  Y1       6876.58    A       1991        1988        6345.48       3   6122.399 6568.561
  Y1       5379.75    B       1989        1988        5473.87       1   5250.789 5696.951
  Y1       6158.68    B       1990        1988        6010.43       2   5787.349 6233.511
  Y1       6876.58    B       1991        1988        6546.63       3   6323.549 6769.711
  Y1       5379.75    C       1989        1988        5406.43       1   5183.349 5629.511
  Y1       6158.68    C       1990        1988        5875.96       2   5652.879 6099.041
  Y1       6876.58    C       1991        1988        6345.48       3   6122.399 6568.561
  Y2       4793.20    A       1989        1988        4142.60       1   3919.519 4365.681

Examples in R (Exploratory analysis of forecast)

Prediction-Realization Diagram:

Examples in R (Exploratory analysis of forecast)

Fanchart:

Examples in R (Accuracy)

             horizon = 1 horizon = 2 horizon = 3 horizon = 4 horizon = 5 horizon = 6  
NAIVE2          8.360053    19.23712    21.70531    23.45871    25.17578    27.35164      
SINGLE          8.426719    19.53460    21.70985    23.59725    25.35748    27.93413  
HOLT            8.504891    20.57738    26.74072    30.80756    34.94463    37.94606      
DAMPEN          8.161127    19.23165    22.88949    26.32286    30.25410    31.27435    
WINTER          8.504891    20.57738    26.74072    30.80756    34.94463    37.94606      
COMB S-H-D      7.964892    19.02728    22.76000    25.56244    28.63649    30.24861      
B-J auto        8.638050    19.71086    22.78263    26.77603    27.99026    30.82170      
AutoBox1       10.119198    22.51186    27.07629    31.31042    34.37756    40.08493      
AutoBox2        7.951192    18.21996    20.24227    21.65581    24.46921    27.17624      
AutoBox3       10.698830    21.89010    25.29647    28.45540    29.57899    33.62135    
ROBUST-Trend    7.606495    18.64720    22.39440    24.83567    27.61491    30.66538      
ARARMA          9.091266    20.68177    25.10429    30.14883    34.99774    40.38033      
Auto-ANN        8.956602    19.67521    21.76107    24.36152    26.41399    29.81788      
Flors-Pearc1    8.561016    19.38149    22.80052    25.34184    27.62398    30.95579      
Flors-Pearc2   10.903332    21.38609    23.17941    24.91399    27.72512    31.29920      
PP-Autocast     8.141452    19.19054    22.75382    26.17481    30.09973    31.09496      
ForecastPro     8.426093    18.77205    22.10483    25.87735    27.74920    30.45980      
SMARTFCS        9.796722    20.29223    23.64564    25.85210    28.55908    31.99116      
THETAsm         7.907310    18.26210    21.41826    23.33240    25.61775    27.89275      
THETA           8.172273    19.38538    22.36993    25.85993    28.69015    31.01968      
RBF             8.146542    18.86758    21.67786    22.58918    25.20706    26.92872

Examples in R (Accuracy)

Examples in R (Validation of PIs)

Conclusions

Having forecasting data stored in a well-defined way is crucial for monitoring and evaluating forecasting accuracy. In spite of the fact that a number of large-scale forecasting competitions have been conducted, at present there is no unified approach of how to store forecasting data. In this paper we aimed to present a data schema that is suitable for keeping forecasting data in a table as a part of a RDB or as a portable file.
We also showed how to implement various algorithms for accuracy evaluation based on the data structures proposed. We provided some examples in R, but, analogously, other existing languages (such as Python) can also be used to perform tasks such as data exploratory analysis and accuracy evaluation. Hopefully, the solutions presented will be flexible enough to be applied by academics and researchers and also by practitioners. One aim of the paper is to highlight the need of separating the forecasting data from the algorithms and tools for handling data (such as tools for viewing time series and forecasting results).