[R] Comparing time series forecasting models in action

Contents

  1. Introduction
  2. Data sets
  3. Forecasting models
  4. Contest #1: predicting sales of Australian wine
  5. Contest #2: predicting Air Passengers
  6. Contest #3: predicting unemployment in Maine
  7. Contest #4: predicting supply of Australian chocolate, electricity and beer
  8. Contest #5: predicting complaints to a motoring organisation
  9. Conclusions
  10. References

Introduction

Time series are analysed to understand the past and to predict the future, enabling managers or policy makers to make properly informed decisions. Here a subset of typical time series forecasting models are compared in datasets split up in train set (70%) and cross validation set (30%). Performances on cross validation sets are compared in terms of

getPerformance = function(pred, val) {
    res = pred - val
    MAE = sum(abs(res))/length(val)
    RSS = sum(res^2)
    MSE = RSS/length(val)
    RMSE = sqrt(MSE)
    perf = data.frame(MAE, RSS, MSE, RMSE)
}


splitTrainXvat = function(tser, perc_train) {
    ntrain = floor(length(as.vector(tser)) * perc_train)
    nval = length(as.vector(tser)) - ntrain

    ttrain = ts(as.vector(tser[1:ntrain]), start = start(tser), frequency = frequency(tser))
    tval = ts(as.vector(tser[ntrain + 1:nval]), start = end(ttrain) + deltat(tser), 
        frequency = frequency(tser))

    stopifnot(length(ttrain) == ntrain)
    stopifnot(length(tval) == nval)

    list(ttrain, tval)
}

Data sets

Models are compared on the following data sets:

Sales of Australian wine

The data in the file wine.dat are monthly sales of Australian wine by category, in thousands of litres, from January 1980 until July 1995. The categories are fortified white, dry white, sweet white, red, rose, and sparkling. The sweet white wine time series is plotted in Figure 3.9, and there is a dramatic increase in sales in the second half of the 1980s followed by a reduction to a level well

hurl <- new.env(hash = T, parent = emptyenv())
hurl[["maine.dat"]] = "http://elena.aut.ac.nz/~pcowpert/ts/Maine.dat"
hurl[["wine.dat"]] = "http://elena.aut.ac.nz/~pcowpert/ts/wine.dat"
hurl[["motororg.dat"]] = "http://elena.aut.ac.nz/~pcowpert/ts/motororg.dat"
hurl[["cbe.dat"]] = "http://elena.aut.ac.nz/~pcowpert/ts/cbe.dat"

www = hurl[["wine.dat"]]
wine.dat = read.table(www, header = T)
attach(wine.dat)
sweetw.ts = ts(sweetw, start = c(1980, 1), freq = 12)
detach(wine.dat)
plot(sweetw.ts, xlab = "Time (months)", ylab = "sales (1000 litres)")

plot of chunk unnamed-chunk-2

It follows a decomposition of trend and seasonal effects using a moving average method.

plot(decompose(sweetw.ts))

plot of chunk unnamed-chunk-3

Air Passengers

The number of international passenger bookings (in thousands) per month on an airline (Pan Am) in the United States were obtained from the Federal Aviation Administration for the period 1949-1960 (Brown, 1963). The company used the data to predict future demand before ordering new aircraft and training aircrew.

data(AirPassengers)
AP <- AirPassengers
plot(AP, ylab = "Passengers (1000's)")

plot of chunk unnamed-chunk-4

It follows a decomposition of trend and seasonal effects using a moving average method.

plot(decompose(AP))

plot of chunk unnamed-chunk-5

A summary of the values for each season can be viewed.

boxplot(AP ~ cycle(AP))

plot of chunk unnamed-chunk-6

Unemployment: Maine

Unemployment rates are one of the main economic indicators used by politicians and other decision makers. For example, they influence policies for regional development and welfare provision. Here we I consider the monthly unemployment rate for the US state of Maine from January 1996 until August 2006.

www = hurl[["maine.dat"]]
Maine.month = read.table(www, header = T)
attach(Maine.month)
Maine.month.ts <- ts(unemploy, start = c(1996, 1), freq = 12)
detach(Maine.month)
Maine.annual.ts <- aggregate(Maine.month.ts)/12
par(mfrow = c(2, 1))
plot(Maine.month.ts, ylab = "unemployed (%)", xlab = "Time (months)")
plot(Maine.annual.ts, ylab = "unemployed (%)", xlab = "Time (years)")

plot of chunk unnamed-chunk-7

It follows a decomposition of trend and seasonal effects using a moving average method.

plot(decompose(Maine.month.ts))

plot of chunk unnamed-chunk-8

Supply of Australian Chocolate, Electricity and Beer

The monthly supply of electricity (millions of kWh), beer (Ml), and chocolate-based production (tonnes) in Australia over the period January 1958 to December 1990 from the Australian Bureau of Statistics (ABS).

www = hurl[["cbe.dat"]]
CBE <- read.table(www, header = T)
attach(CBE)
Choc.ts <- ts(choc, start = c(1958, 1), freq = 12)
Elec.ts <- ts(elec, start = c(1958, 1), freq = 12)
Beer.ts <- ts(beer, start = c(1958, 1), freq = 12)
detach(CBE)
plot(cbind(Elec.ts, Beer.ts, Choc.ts), xlab = "Time (months)", main = "Choc (tonnes), Elec (kwh), Beer (Ml)")

plot of chunk unnamed-chunk-9

It follows a decomposition of chocolate supply in trend and seasonal effects using a moving average method.

plot(decompose(Choc.ts))

plot of chunk unnamed-chunk-10

It follows a decomposition of electricity supply in trend and seasonal effects using a moving average method.

plot(decompose(Elec.ts))

plot of chunk unnamed-chunk-11

It follows a decomposition of beer supply in trend and seasonal effects using a moving average method.

plot(decompose(Beer.ts))

plot of chunk unnamed-chunk-12

Complaints to a motoring organisation

The number of letters of complaint received each month by a motoring organisation over the four years 1996 to 1999 are available on the website. At the beginning of the year 2000, the organisation wishes to estimate the current level of complaints and investigate any trend in the level of complaints.

www = hurl[["motororg.dat"]]
Motor.dat = read.table(www, header = T)
attach(Motor.dat)
Comp.ts <- ts(complaints, start = c(1996, 1), freq = 12)
detach(Motor.dat)
plot(Comp.ts, xlab = "Time / months", ylab = "Complaints")

plot of chunk unnamed-chunk-13

It follows a decomposition of trend and seasonal effects using a moving average method.

plot(decompose(Comp.ts))

plot of chunk unnamed-chunk-14

Forecasting models

The following forecasting models are compared:

Autoregressive models

The series \( \{{x_{t}}\} \) is an autoregressive process of order p, abbreviated to AR(p), if

\( {x_{t}} \) = \( {x_{t-1}} \) \( {a_{t-1}} \) + \( {x_{t-2}} \) \( {a_{t-2}} \) + … + \( {x_{t-p}} \) \( {a_{t-p}} \) + \( {w_{t}} \)

where \( {w_{t}} \) is white noise. Further details in [3]. AR(p) models will be fitted to data using the R ar function.

Holt-Winters models

The Holt-Winters method was suggested by Holt (1957) and Winters (1960), who were working in the School of Industrial Administration at Carnegie Institute of Technology, and uses exponentially weighted moving averages to update estimates of the seasonally adjusted mean (called the level ), slope, and seasonals. Further details in [3]. Here are considered HW models with both additive and multiplicative seasonal components. HW models models will be fitted to data using the R HoltWinters function.

Linear regression models with and without seasonal features

A model for a time series \( \{{x_{t}}\} \) is linear if it can be expressed as

\( {x_{t}} \) = \( {a_{0}} \) + \( {a_{1}} \) \( {u_{1,t}} \) + \( {a_{2}} \) \( {u_{2,t}} \) + … + \( {a_{m}} \) \( {u_{m,t}} \) + \( {z_{t}} \)

where \( {u_{i,t}} \) is the value of the i-th predictor (or explanatory) variable at time t (i = 1, … ,m; t = 1, … , n), \( {z_{t}} \) is the error at time t, and \( {a_{0}} \), \( {a_{1}} \), …, \( {a_{m}} \) are model parameters, which can be estimated by least squares.

Linear models for time series are non-stationary when they include functions of time. Differencing can often transform a non-stationary series with a deterministic trend to a stationary series.

A seasonal indicator model for a time series \( \{{x_{t}}\} \) containing \( {s} \) seasons and a trend \( {m_{t}} \) is given by

\( {x_{t}} \) = \( {m_{t}} \) + \( {s_{t}} \) + \( {z_{t}} \)

where \( {s_{t}} \) = \( {b_{i}} \) when t falls in the i-th season (t = 1, … , n; i = 1, … , s) and \( {z_{t}} \) is the residual error series, which may be autocorrelated. Further details in [3]. Linear regression models with and without seasonal features are here fitted to data and used to make predictions as follows.

buildLinearRegSeas = function(myts) {
    Time = time(myts)
    Seas = cycle(myts)
    lm = lm(myts ~ Time)
    lmSeas = lm(myts ~ 0 + Time + factor(Seas))
    list(lmSeas, lm)
}
predictLinearRegSeas = function(valts, regBoundle, freq = 12) {
    lm = regBoundle[[2]]
    lmSeas = regBoundle[[1]]

    new.t = as.vector(time(valts))

    pred.lm = lm$coeff[1] + lm$coeff[2] * new.t
    beta = c(rep(coef(lmSeas)[2:13], floor(length(valts)/freq)), coef(lmSeas)[2:((length(valts)%%freq) + 
        1)])
    pred.lmSeas = lmSeas$coeff[1] * new.t + beta

    list(pred.lmSeas, pred.lm)
}

Harmonic seasonal models

Seasonal effects often vary smoothly over the seasons, so that it may be more parameter-efficient to use a smooth function instead of separate indices. Sine and cosine functions can be used to build smooth variation into a seasonal model, e.g. a sine wave with frequency f (cycles per sampling interval), amplitude A. Further details in [3]. Harmonic seasonal models are here, where used, fitted to data and used to make predictions as follows.

buildHarmonicModel = function(myts) {
    Time = time(myts)
    terms = c("Time", "I(Time^2)", "COS[,1]", "SIN[,1]", "COS[,2]", "SIN[,2]", 
        "COS[,3]", "SIN[,3]", "COS[,4]", "SIN[,4]", "COS[,5]", "SIN[,5]", "COS[,6]", 
        "SIN[,6]")
    SIN = COS = matrix(nr = length(myts), nc = 6)
    for (i in 1:6) {
        COS[, i] = cos(2 * pi * i * Time)
        SIN[, i] = sin(2 * pi * i * Time)
    }
    Tscal = Time
    mod.all = lm(myts ~ Time + I(Time^2) + COS[, 1] + SIN[, 1] + COS[, 2] + 
        SIN[, 2] + COS[, 3] + SIN[, 3] + COS[, 4] + SIN[, 4] + COS[, 5] + SIN[, 
        5] + COS[, 6] + SIN[, 6])
    tscore = coef(mod.all)/sqrt(diag(vcov(mod.all)))
    fmla <- as.formula(paste("myts ~ ", paste(terms[abs(tscore) > 2], collapse = "+")))
    mod = lm(fmla)
    mod.res.ar = ar(resid(mod), method = "mle")
    list(mod.res.ar, mod)
}

predictHarmonicModel = function(valts, boundle) {
    mod = boundle[[2]]
    mod.res.ar = boundle[[1]]

    Time.val = time(valts)
    SIN = COS = matrix(nr = length(valts), nc = 6)
    for (i in 1:6) {
        COS[, i] = cos(2 * pi * i * Time.val)
        SIN[, i] = sin(2 * pi * i * Time.val)
    }
    new.t.scal = Time.val
    res.ar = predict(mod.res.ar, n.ahead = length(valts))
    pred = mod$coeff[1] + mod$coeff[2] * new.t.scal + mod$coeff[3] * I(new.t.scal^2) + 
        mod$coeff[4] * SIN[, 1] + mod$coeff[5] * SIN[, 2] + mod$coeff[6] * SIN[, 
        3]
    pred.res.ar = as.vector(pred) + as.vector(res.ar$pred)
    list(pred, pred.res.ar)
}

Seasonal Autoregressive Integrated Moving Average models (SARIMA) and log-transformed SARIMA models

Differencing a series \( {x_{t}} \) can remove trends, whether these trends are stochastic, as in a random walk, or deterministic, as in the case of a linear trend. In the case of a random walk, \( {x_{t}} \) = \( {x_{t-1}} \) + \( {w_{t}} \), the first-order differenced series is white noise \( {w_{t}} \) (i.e., \( {diff(x_{t})} \) = \( {x_{t}} \) − \( {x_{t-1}} \) = \( {w_{t}} \)) and so is stationary. In contrast, if \( {x_{t}} \) = \( {a} \) + \( {b} \)\( {t} \) + \( {w_{t}} \), a linear trend with white noise errors, then \( {diff(x_{t})} \) = \( {x_{t}} \) − \( {x_{t-1}} \) = \( {b} \) + \( {w_{t}} \)-\( {w_{t-1}} \), which is a stationary moving average process rather than white noise. Notice that the consequence of differencing a linear trend with white noise is an MA(1) process, whereas subtraction of the trend would give white noise. As the differenced series needs to be aggregated (or ‘integrated’) to recover the original series, the underlying stochastic process is called autoregressive integrated moving average, which is abbreviated to ARIMA. The ARIMA process can be extended to include seasonal terms, giving a non-stationary seasonal ARIMA (SARIMA) process.

Many non-linear models can be transformed to linear models. For example, an exponential model for the series \( {x_{t}} \) can be transformed by taking natural logarithms to obtain a linear model for the series \( {y_{t}} \):

\( {y_{t}} \) = \( {log(x_{t})} \) = \( {a_{o}} \) + \( {a_{1}} \) \( {t} \) + \( {z_{t}} \)

Further details in [3]. SARIMA models and log-transformed SARIMA models are here fitted to data and used to make predictions as follows.

get.best.arima <- function(x.ts, maxord = c(1, 1, 1, 1, 1, 1)) {
    best.aic <- 1e+08
    n <- length(x.ts)
    for (p in 0:maxord[1]) 
    for (d in 0:maxord[2]) 
    for (q in 0:maxord[3]) 
    for (P in 0:maxord[4]) 
    for (D in 0:maxord[5]) 
    for (Q in 0:maxord[6]) {

        tryCatch({
            fit <- arima(x.ts, order = c(p, d, q), seas = list(order = c(P, 
                D, Q), frequency(x.ts)), method = "CSS")
            fit.aic <- -2 * fit$loglik + (log(n) + 1) * length(fit$coef)
            if (fit.aic < best.aic) {
                best.aic <- fit.aic
                best.fit <- fit
                best.model <- c(p, d, q, P, D, Q)
            }

        }, error = function(e) {

        })
    }
    list(best.aic, best.fit, best.model)
}

Contest #1: predicting sales of Australian wine

Using above definitions, let's see how such forecasting models perform in predicting sales of Australian wine.

compareModels = function(contestName, ts_train, ts_val, doPlot = T, harm = F) {
    ###### models
    mod.ar = ar(ts_train)
    mod.hw.mul = HoltWinters(ts_train, seasonal = "mul")
    mod.hw.add = HoltWinters(ts_train, seasonal = "add")
    regBoundle = buildLinearRegSeas(ts_train)
    mod.reg = regBoundle[[1]]
    mod.reg.2 = regBoundle[[2]]
    mod.arima <- get.best.arima(ts_train, maxord = c(2, 2, 2, 2, 2, 2))[[2]]
    mod.arima.log <- get.best.arima(log(ts_train), maxord = c(2, 2, 2, 2, 2, 
        2))[[2]]

    models = c(mod.ar, mod.hw.mul, mod.hw.add, mod.reg, mod.reg.2, mod.arima, 
        mod.arima.log)

    if (harm) {
        harmonicBoundle = buildHarmonicModel(ts_train)
        mod.reg.3 = harmonicBoundle[[2]]
        mod.reg.3.res.ar = harmonicBoundle[[1]]
        models = c(mod.ar, mod.hw.mul, mod.hw.add, mod.reg, mod.reg.2, mod.reg.3, 
            mod.reg.3.res.ar, mod.arima, mod.arima.log)
    }

    ###### predictions
    pred.ar = predict(mod.ar, n.ahead = length(ts_val))
    pred.hw.mul = predict(mod.hw.mul, n.ahead = length(ts_val))
    pred.hw.add = predict(mod.hw.add, n.ahead = length(ts_val))
    predRegBoundle = predictLinearRegSeas(ts_val, regBoundle)
    pred.reg = predRegBoundle[[2]]
    pred.reg.2 = predRegBoundle[[1]]
    pred.arima <- predict(mod.arima, n.ahead = length(ts_val))$pred
    pred.arima.log <- exp(predict(mod.arima.log, n.ahead = length(ts_val))$pred)

    ####### performance
    perf.ar = cbind(type = c("AR"), getPerformance(as.vector(pred.ar$pred), 
        as.vector(ts_val)))
    perf.hw.mul = cbind(type = c("HW.mul"), getPerformance(as.vector(pred.hw.mul), 
        as.vector(ts_val)))
    perf.hw.add = cbind(type = c("HW.add"), getPerformance(as.vector(pred.hw.add), 
        as.vector(ts_val)))
    perf.reg = cbind(type = c("Reg"), getPerformance(as.vector(pred.reg), as.vector(ts_val)))
    perf.reg.2 = cbind(type = c("Reg.seas"), getPerformance(as.vector(pred.reg.2), 
        as.vector(ts_val)))
    perf.arima = cbind(type = c("SARIMA"), getPerformance(as.vector(pred.arima), 
        as.vector(ts_val)))
    perf.arima.log = cbind(type = c("SARIMA.log"), getPerformance(as.vector(pred.arima.log), 
        as.vector(ts_val)))

    perf = rbind(perf.ar, perf.hw.mul, perf.hw.add, perf.reg, perf.reg.2, perf.arima, 
        perf.arima.log)

    if (harm) {
        predHarmonicBoundle = predictHarmonicModel(ts_val, harmonicBoundle)
        pred.reg.3 = predHarmonicBoundle[[1]]
        pred.reg.3.res.ar = predHarmonicBoundle[[2]]

        perf.reg.3 = cbind(type = c("Reg.harm"), getPerformance(as.vector(pred.reg.3), 
            as.vector(ts_val)))
        perf.reg.3.res.ar = cbind(type = c("Reg.harm.res.ar"), getPerformance(as.vector(pred.reg.3.res.ar), 
            as.vector(ts_val)))

        perf = rbind(perf.ar, perf.hw.mul, perf.hw.add, perf.reg, perf.reg.2, 
            perf.reg.3, perf.reg.3.res.ar, perf.arima, perf.arima.log)
    }

    if (doPlot) {
        ts_all = ts(c(as.vector(ts_train), as.vector(ts_val)), start = start(ts_train), 
            frequency = frequency(ts_train))
        ts.plot(ts_all, pred.ar$pred, pred.hw.mul, pred.hw.add, ts(pred.reg, 
            start = start(ts_val), frequency = frequency(ts_all)), col = 1:5, 
            lty = 1:5)
        legend("topleft", c(contestName, "AR", "HW.mul", "HW.add", "Reg"), lty = 1:5, 
            col = 1:5)

        if (!harm) {
            ts.plot(ts_all, ts(pred.reg.2, start = start(ts_val), frequency = frequency(ts_all)), 
                ts(pred.arima, start = start(ts_val), frequency = frequency(ts_all)), 
                ts(pred.arima.log, start = start(ts_val), frequency = frequency(ts_all)), 
                col = 1:4, lty = 1:4)
            legend("topleft", c(contestName, "Reg.seas", "SARIMA", "SARIMA.log"), 
                lty = 1:4, col = 1:4)
        } else {
            ts.plot(ts_all, ts(pred.reg.2, start = start(ts_val), frequency = frequency(ts_all)), 
                ts(pred.reg.3, start = start(ts_val), frequency = frequency(ts_all)), 
                ts(pred.reg.3.res.ar, start = start(ts_val), frequency = frequency(ts_all)), 
                ts(pred.arima, start = start(ts_val), frequency = frequency(ts_all)), 
                ts(pred.arima.log, start = start(ts_val), frequency = frequency(ts_all)), 
                col = 1:6, lty = 1:6)
            legend("topleft", c(contestName, "Reg.seas", "Reg.harm", "Reg.harm.seas.ar", 
                "SARIMA", "SARIMA.log"), lty = 1:6, col = 1:6)
        }
    }

    list(perf[order(perf$MAE), ], models[order(perf$MAE)])
}

####### the contest
data = splitTrainXvat(sweetw.ts, 0.7)
ts.train = data[[1]]
ts.val = data[[2]]
comparisons = compareModels("Sales of Australian wine", ts.train, ts.val, doPlot = T, 
    harm = T)

plot of chunk unnamed-chunk-18 plot of chunk unnamed-chunk-18

comparisons[1]
## [[1]]
##              type    MAE     RSS   MSE   RMSE
## 9      SARIMA.log  40.52  148576  2607  51.05
## 1              AR  51.10  216306  3795  61.60
## 3          HW.add  93.84  643785 11294 106.28
## 2          HW.mul 103.80  745160 13073 114.34
## 7 Reg.harm.res.ar 109.92  846813 14856 121.89
## 6        Reg.harm 132.05 1184123 20774 144.13
## 8          SARIMA 238.91 4624878 81138 284.85
## 5        Reg.seas 247.32 3962293 69514 263.65
## 4             Reg 250.46 3913747 68662 262.03

The Winner is:

## [1] "SARIMA.log"

Partial autocorrelations of residuals of SARIMA.log (the winner model) is almost white noise. It has one (marginally) significant value at lag 0 and another (marginally) significant value at lag 0.8. On the other hand, SARIMA has only one (marginally) significant value at lag 1.4.

mod.arima.log <- get.best.arima(log(ts.train), maxord = c(2, 2, 2, 2, 2, 2))[[2]]
mod.arima <- get.best.arima(ts.train, maxord = c(2, 2, 2, 2, 2, 2))[[2]]

pacf(mod.arima.log$res)

plot of chunk unnamed-chunk-20

pacf(mod.arima$res)

plot of chunk unnamed-chunk-20

The point here is how much these componets are significant, as they are just above the statistical threshold. Hence, let's try to fit the residual series of SARIMA.log with an AR(8).

pred.arima.log <- exp(predict(mod.arima.log, n.ahead = length(ts.val))$pred)
pred.arima <- predict(mod.arima, n.ahead = length(ts.val))$pred

res.ar <- ar(mod.arima.log$res, order = 8, method = "mle")
pred.res.ar <- predict(res.ar, n.ahead = length(ts.val))$pred

pred.arima.log.new <- as.vector(pred.arima.log) + as.vector(pred.res.ar)

ts_all = ts(c(as.vector(ts.train), as.vector(ts.val)), start = start(ts.train), 
    frequency = frequency(ts.train))
ts.plot(ts_all, ts(pred.arima, start = start(ts.val), frequency = 12), ts(pred.arima.log, 
    start = start(ts.val), frequency = 12), ts(pred.arima.log.new, start = start(ts.val), 
    frequency = 12), col = 1:4, lty = 1:4)
legend("topleft", c("Sales of Australian wine", "SARIMA", "SARIMA.log", "SARIMA.log.new"), 
    lty = 1:4, col = 1:4)

plot of chunk unnamed-chunk-21


perf.sarima.log.new = getPerformance(pred.arima.log.new, ts.val)
perf.sarima.log.new
##     MAE    RSS  MSE  RMSE
## 1 40.52 148578 2607 51.06

Performances of SARIMA.log.new are equal to the ones of SARIMA.log confirming that the latter provides a pretty good fit to data (being simpler than the former).

Contest #2: predicting Air Passengers

Using above definitions, let's see how such forecasting models perform in predicting Air Passengers.

data = splitTrainXvat(AP, 0.7)
ts.train = data[[1]]
ts.val = data[[2]]
comparisons = compareModels("Air Passengers", ts.train, ts.val, doPlot = T)

plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22

comparisons[1]
## [[1]]
##         type    MAE     RSS     MSE   RMSE
## 3     HW.add  20.89   33315   757.2  27.52
## 2     HW.mul  41.88  124268  2824.3  53.14
## 7 SARIMA.log  50.16  145357  3303.6  57.48
## 4        Reg  54.87  251403  5713.7  75.59
## 6     SARIMA  65.00  237190  5390.7  73.42
## 5   Reg.seas  69.43  308366  7008.3  83.72
## 1         AR 166.82 1614680 36697.3 191.57

The Winner is:

## [1] "HW.add"

Contest #3: predicting unemployment in Maine

Using above definitions, let's see how such forecasting models perform in predicting unemployment in Maine.

data = splitTrainXvat(Maine.month.ts, 0.7)
ts.train = data[[1]]
ts.val = data[[2]]
comparisons = compareModels("% unemployment", ts.train, ts.val, doPlot = T)

plot of chunk unnamed-chunk-24 plot of chunk unnamed-chunk-24

comparisons[1]
## [[1]]
##         type    MAE     RSS     MSE   RMSE
## 1         AR 0.2486   3.557 0.09121 0.3020
## 3     HW.add 0.3055   5.077 0.13019 0.3608
## 7 SARIMA.log 0.6302  19.898 0.51021 0.7143
## 2     HW.mul 0.8293  37.574 0.96345 0.9816
## 4        Reg 1.2826  74.873 1.91982 1.3856
## 5   Reg.seas 1.3045 107.816 2.76450 1.6627
## 6     SARIMA 1.4469 105.791 2.71260 1.6470

The Winner is:

## [1] "AR"

Contest #4: predicting supply of Australian chocolate, electricity and beer

Using above definitions, let's see how such forecasting models perform in predicting supply of Australian chocolate, electricity and beer.

Australian chocolate

data = splitTrainXvat(Choc.ts, 0.7)
ts.train = data[[1]]
ts.val = data[[2]]
comparisons = compareModels("Choc (tonnes)", ts.train, ts.val, doPlot = T)

plot of chunk unnamed-chunk-26 plot of chunk unnamed-chunk-26

comparisons[1]
## [[1]]
##         type  MAE       RSS     MSE RMSE
## 6     SARIMA 1171 245432590 2062459 1436
## 5   Reg.seas 1279 327021702 2748082 1658
## 3     HW.add 1341 319590318 2685633 1639
## 7 SARIMA.log 1409 356587175 2996531 1731
## 4        Reg 1417 338681191 2846060 1687
## 2     HW.mul 1493 387680124 3257816 1805
## 1         AR 2308 884618516 7433769 2726

The Winner is:

## [1] "SARIMA"

Australian electricity

data = splitTrainXvat(Elec.ts, 0.7)
ts.train = data[[1]]
ts.val = data[[2]]
comparisons = compareModels("Elec (kwh)", ts.train, ts.val, doPlot = T)

plot of chunk unnamed-chunk-28 plot of chunk unnamed-chunk-28

comparisons[1]
## [[1]]
##         type    MAE       RSS      MSE   RMSE
## 6     SARIMA  276.8 1.359e+07   114228  338.0
## 2     HW.mul  342.9 2.552e+07   214477  463.1
## 3     HW.add  353.3 2.543e+07   213677  462.3
## 7 SARIMA.log 1039.6 1.674e+08  1406599 1186.0
## 5   Reg.seas 1051.4 2.007e+08  1686316 1298.6
## 4        Reg 1105.8 2.298e+08  1931265 1389.7
## 1         AR 4356.0 2.983e+09 25068502 5006.8

The Winner is:

## [1] "SARIMA"

Australian beer

data = splitTrainXvat(Beer.ts, 0.7)
ts.train = data[[1]]
ts.val = data[[2]]
comparisons = compareModels("Beer (Ml)", ts.train, ts.val, doPlot = T)

plot of chunk unnamed-chunk-30 plot of chunk unnamed-chunk-30

comparisons[1]
## [[1]]
##         type   MAE    RSS    MSE  RMSE
## 3     HW.add 10.68  22897  192.4 13.87
## 2     HW.mul 10.84  23378  196.5 14.02
## 1         AR 15.02  46326  389.3 19.73
## 7 SARIMA.log 22.26  80037  672.6 25.93
## 6     SARIMA 26.68 141905 1192.5 34.53
## 5   Reg.seas 39.05 236582 1988.1 44.59
## 4        Reg 40.93 256686 2157.0 46.44

The Winner is:

## [1] "HW.add"

Contest #5: predicting complaints to a motoring organisation

Using above definitions, let's see how such forecasting models perform in predicting complaints to a motoring organisation.

data = splitTrainXvat(Comp.ts, 0.7)
ts.train = data[[1]]
ts.val = data[[2]]
comparisons = compareModels("Complaints", ts.train, ts.val, doPlot = T)

plot of chunk unnamed-chunk-32 plot of chunk unnamed-chunk-32

comparisons[1]
## [[1]]
##         type       MAE       RSS       MSE      RMSE
## 1         AR     5.424 6.809e+02 4.539e+01     6.737
## 4        Reg     6.335 8.478e+02 5.652e+01     7.518
## 2     HW.mul     6.469 8.987e+02 5.991e+01     7.740
## 3     HW.add     6.752 9.958e+02 6.639e+01     8.148
## 5   Reg.seas     7.841 1.163e+03 7.753e+01     8.805
## 6     SARIMA    20.124 8.541e+03 5.694e+02    23.862
## 7 SARIMA.log 14875.857 4.562e+10 3.041e+09 55145.311

The Winner is:

## [1] "AR"

Conclusions

In conclusion, there's no a one size fits all forecasting model, though SARIMA, HW.add and AR performances appear better than other models. A brief summary of these performances follows.

Data set Winner 2nd place
Sales of Australian wine SARIMA.log AR
Air Passengers HW.add HW.mul
Unemployment: Maine AR HW.add
Supply of Australian Chocolate SARIMA Reg.seas
Supply of Australian Electricity SARIMA HW.mul
Supply of Australian Beer HW.add HW.mul
Complaints to a motoring org. AR Reg

References

  1. J. Friedman, T. Hastie, R. Tibshirani, The Elements of Statistical Learning, Springer, 2009
  2. G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning, Springer, 2013
  3. Paul S.P. Cowpertwait, Andrew V. Metcalfe, Introductory Time Series with R, Springer, 2009