This analysis is of the debitcards dataset from the fpp2 package. The data is monthly retail debit card usage in Iceland (million ISK). Data is from January 2000 - August 2013.
Setup
library(fpp2)
## Loading required package: ggplot2
## Loading required package: forecast
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Loading required package: fma
## Loading required package: expsmooth
library(ggplot2)
library(forecast)
library(psych)
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
library(seasonal)
##
## Attaching package: 'seasonal'
## The following object is masked from 'package:psych':
##
## outlier
library(tseries)
Data
str(debitcards)
## Time-Series [1:164] from 2000 to 2014: 7.2 7.33 7.81 7.41 9.14 ...
describe(debitcards)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 164 15.83 4.81 16.34 15.76 5.61 7.2 26.68 19.47 0.01 -0.89 0.38
autoplot(debitcards)
decomp = seas(debitcards, x11="")
autoplot(decomp)
logdebit = log(debitcards)
ETS on all data
ETSmod = ets(logdebit, lambda = "auto")
ETSfc = forecast(ETSmod, h = 20)
autoplot(ETSfc)
ARIMA on all data
adf.test(logdebit, alternative = "stationary")
##
## Augmented Dickey-Fuller Test
##
## data: logdebit
## Dickey-Fuller = -3.0132, Lag order = 5, p-value = 0.1536
## alternative hypothesis: stationary
ARIMAmod = auto.arima(logdebit, lambda = "auto")
ARIMAfc = forecast(ARIMAmod, h =20)
autoplot(ARIMAfc)
Compare Models on All Data
checkresiduals(ETSfc)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,A,A)
## Q* = 77.066, df = 8, p-value = 1.901e-13
##
## Model df: 16. Total lags used: 24
checkresiduals(ARIMAfc)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,2)(0,1,1)[12]
## Q* = 62.223, df = 21, p-value = 5.851e-06
##
## Model df: 3. Total lags used: 24
accuracy(ETSfc)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.0006943777 0.04663458 0.03620388 -0.02977931 1.344776 0.4199362
## ACF1
## Training set 0.1613487
accuracy(ARIMAfc)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.002670112 0.0459881 0.03377275 -0.1020821 1.238912 0.391737
## ACF1
## Training set 0.01039682
The models are pretty much the same. The ETS model is less biased, but neither are particularly biased. The RMSE, MAPE, and MASE are very similar in each model. The ARIMA model has less autocorrelation. I would probably choose the ARIMA model, but I don't think its a consequential choice.
Split data
trainData = window(logdebit, end = c(2013,2))
testData = window(logdebit, start = c(2013,3))
ETS on training data
ETSmod1 = ets(trainData, lambda = "auto")
ETSfc1 = forecast(ETSmod1, h =20)
autoplot(ETSfc1)
ARIMA on training data
adf.test(trainData, alternative = "stationary")
##
## Augmented Dickey-Fuller Test
##
## data: trainData
## Dickey-Fuller = -2.6974, Lag order = 5, p-value = 0.2856
## alternative hypothesis: stationary
ARIMAmod1 = auto.arima(trainData, lambda = "auto")
ARIMAfc1 = forecast(ARIMAmod1, h =20)
autoplot(ARIMAfc1)
Check Accuracy on Test Data
checkresiduals(ETSfc1)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,A,A)
## Q* = 60.057, df = 8, p-value = 4.542e-10
##
## Model df: 16. Total lags used: 24
checkresiduals(ARIMAfc1)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,2)(0,1,1)[12]
## Q* = 59.155, df = 21, p-value = 1.713e-05
##
## Model df: 3. Total lags used: 24
accuracy(ETSfc1, testData)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.000400574 0.04493723 0.03422169 -0.0157327 1.285298 0.3887494
## Test set 0.005295614 0.04889675 0.04346730 0.1378463 1.401940 0.4937770
## ACF1 Theil's U
## Training set 0.06052602 NA
## Test set -0.48511869 0.4789792
accuracy(ARIMAfc1, testData)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.002719703 0.04566195 0.03319309 -0.1041334 1.225436 0.3770647
## Test set 0.004143640 0.04788763 0.04283163 0.1076254 1.384454 0.4865559
## ACF1 Theil's U
## Training set 0.006756319 NA
## Test set -0.503737111 0.4849713
The ARIMA model performs better all measures excet for the ACF1. Neither model has particularly large mean errors, but the ARIMA model is less biased. RMSEs are similar among both models, but the mean percentage error is lower in the ARIMA model. Both models perform better than seasonal naive models as well, but the ARIMA wins out on the MASE metric as well.