According to the Merriam-Webster dictionary (http://www.merriam-webster.com/dictionary/forecast), the act of forecasting consists in saying that (something) will happen in the future or to predict (something, such as weather) after looking at the information that is available
Forecasting is a form of prediction. In time series analysis we often want to predict something in the future on the basis of what we have observed in the past. The evaluation of the accuracy of our prediction is an important part of model evaluation. Poor accuracy means that our prediction will be scarcely helpful.
Herewith enclosed is an exercise I’ve done on the basis of the online book of Hyndman and Athanasopoulos “Forecasting: principles and practice”, a very good reading. You may find the book here: https://www.otexts.org/fpp
You may find details on the topic (accuracy in forecasting) in the online book of Hyndman and Athanasopoulos: https://www.otexts.org/fpp/2/5
The codes are a reworking of the codes I’ve found in the site and elsewhere in another site of Rob J. Hyndman (http://robjhyndman.com/hyndsight/)
Of course, all responsability is mine, so, if you find errors and mistakes, this is my responsability and not of Hyndman or Athanasopoulos.
I will appreciate that any error be signalled to me at the Twitter address below.
Let’s start!
# first load the necessary packages to have the file ready to be published in RPub
# you do not need these libraries to perform the analyses
library(knitr)
library(rmarkdown)
## Warning: package 'rmarkdown' was built under R version 3.1.3
The example is based on the dataset “AirPassengers”
# load the necessary library
library(forecast)
## Warning: package 'forecast' was built under R version 3.1.3
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Loading required package: timeDate
## This is forecast 6.1
data(AirPassengers)
# inspect the series
AirPassengers
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949 112 118 132 129 121 135 148 148 136 119 104 118
## 1950 115 126 141 135 125 149 170 170 158 133 114 140
## 1951 145 150 178 163 172 178 199 199 184 162 146 166
## 1952 171 180 193 181 183 218 230 242 209 191 172 194
## 1953 196 196 236 235 229 243 264 272 237 211 180 201
## 1954 204 188 235 227 234 264 302 293 259 229 203 229
## 1955 242 233 267 269 270 315 364 347 312 274 237 278
## 1956 284 277 317 313 318 374 413 405 355 306 271 306
## 1957 315 301 356 348 355 422 465 467 404 347 305 336
## 1958 340 318 362 348 363 435 491 505 404 359 310 337
## 1959 360 342 406 396 420 472 548 559 463 407 362 405
## 1960 417 391 419 461 472 535 622 606 508 461 390 432
### data are assigned to a convenient vector
### this is a easy way to avoid changing the code every time
series <- AirPassengers
# plot the series
plot(series, col="darkblue", ylab="Passegners on airplanes")
# plot the seasonal distribution of the series
windows(width=800,height=350) # set the window with the dimensions you need
boxplot(split(series, cycle(series)), names = month.abb, col = "gold")
The accuracy of forecasts can only be determined by considering how well a model performs on new data that were not used when fitting the model.
The size of the test set is typically about 20% of the total sample
So, we will split the series in a training set and a test set
### training set
### use data from 1949 to 1956 for forecasting
sr = window(series, start=1949, end=c(1956,12))
### test set
### use remaining data from 1957 to 1960 to test accuracy
ser = window(series, start=1957, end=c(1960,12))
Now we are ready to start.
Let’s rock the boat!
######################################################################
# plot training set
######################################################################
plot(sr, main="AirPassengers", ylab="", xlab="Months")
# plot forecasting for 5 years according to four methods
lines(meanf(sr,h=48)$mean, col=4)
lines(rwf(sr,h=48)$mean, col=2)
lines(rwf(sr,drift=TRUE,h=48)$mean, col=3)
lines(snaive(sr,h=48)$mean, col=5)
# legend
legend("topleft", lty=1, col=c(4,2,3, 5),
legend=c("Mean method","Naive method","Drift method", "Seasonal naïve method"),bty="n"
# the test set
lines(ser, col="red")
# accuracy for forecasting of sr (forecasted data) on ser (original data used as test set)
# the best model had the lowest error (particularly the MAPE, Mean absolute percentage error)
# Mean method
accuracy(meanf(sr,h=48), ser)
## ME RMSE MAE MPE MAPE MASE ACF1 Theil's U
## Training set -9.442e-15 71.54 59.07 -11.68 30.89 2.023 0.9282 NA
## Test set 1.998e+02 214.34 199.77 46.60 46.60 6.841 0.7915 4.358
# Naive method
accuracy(rwf(sr,h=48), ser)
## ME RMSE MAE MPE MAPE MASE ACF1 Theil's U
## Training set 2.042 23.33 18.69 0.5236 8.716 0.6402 0.2472 NA
## Test set 107.479 132.61 107.73 23.5372 23.620 3.6891 0.7915 2.528
# Drift method
accuracy(rwf(sr,drift=TRUE,h=48), ser)
## ME RMSE MAE MPE MAPE MASE ACF1 Theil's U
## Training set -1.122e-14 23.24 18.61 -0.5356 8.713 0.6373 0.2472 NA
## Test set 5.745e+01 87.67 63.89 11.7513 13.729 2.1879 0.7296 1.646
# Seasonal naïve method
accuracy(snaive(sr,h=48), ser)
## ME RMSE MAE MPE MAPE MASE ACF1 Theil's U
## Training set 28.80 32.73 29.20 12.49 12.69 1.000 0.7740 NA
## Test set 85.23 97.80 85.23 19.59 19.59 2.919 0.8754 1.932
######################################################################
# plot test set only with the predictions
######################################################################
# calculate the forecasting
sr.mean <- meanf(sr,h=48)$mean
sr.naive <- rwf(sr,h=48)$mean
sr.drift <- rwf(sr,drift=TRUE,h=48)$mean
sr.seas <- snaive(sr,h=48)$mean
# plot the test set
plot(ser, main="AirPassengers", ylab="", xlab="Months", ylim = c(200,600))
# plot forecasting for 4 years according to four methods
lines(sr.mean, col=4)
lines(sr.naive, col=2)
lines(sr.drift, col=3)
lines(sr.seas, col=5)
# legend
legend("topleft", lty=1, col=c(4,2,3,5),
legend=c("Mean method","Naive method","Drift method", "Seasonal naïve method"),bty="n"
It is rather obvious than none of these methods produce a good forecast of the series.
The mean method and the naive method do not detect nor the trend neither the seasonality in the series. The drift method does detect the trend but not the seasonality, while the seasonal naïve method does the reverse. The best method, on the basis of the Mean absolute percentage error (MAPE) is the drift method, which in my view suggests that the trend is more important than the seasonality in this series.
########################################################################
# for ARIMA; Hyndman suggest to use auto-arima without stepwise
########################################################################
library(fpp)
## Loading required package: fma
## Loading required package: tseries
## Loading required package: expsmooth
## Loading required package: lmtest
## Warning: package 'lmtest' was built under R version 3.1.3
trainData <- sr
testData <- ser
# the default value in auto.arima() is test="kpss".
# A KPSS test has a null hypothesis of stationarity
# In general, all the defaults are set to the values that give the best forecasts on average.
# CAUTION! Takes a while to compute
arimaMod <- auto.arima(trainData, stepwise=FALSE, approximation=FALSE)
arimaMod.Fr <-forecast(arimaMod,h=48)
# plot of the prediction and of the test set
plot(arimaMod.Fr)
lines(testData, col="red")
legend("topleft",lty=1,bty = "n",col=c("red","blue"),c("testData","ARIMAPred"))
# plot of the test set and its prediction only
AR.mean <-forecast(arimaMod,h=48)$mean
plot(testData, main="AirPassengers", ylab="", xlab="Months", col="darkblue")
lines(AR.mean, col="red")
# accuracy
accuracy(arimaMod.Fr,testData)
## ME RMSE MAE MPE MAPE MASE ACF1 Theil's U
## Training set 0.7965 8.551 6.092 0.2873 2.768 0.2086 0.01757 NA
## Test set -7.8983 26.104 21.080 -2.6868 5.186 0.7218 0.65516 0.5771
# test residues of arima
tsdisplay(residuals(arimaMod))
# Best lag
# It is recommended using h=10 for non-seasonal data and h=2m for seasonal data, where m is the period of seasonality.
# for seasonality over 12 months, h = 2 * 12 = 24
# see: http://robjhyndman.com/hyndsight/ljung-box-test/
lb <- Box.test(residuals(arimaMod), lag = 24, type = "Ljung-Box")
lb
##
## Box-Ljung test
##
## data: residuals(arimaMod)
## X-squared = 22.94, df = 24, p-value = 0.5232
bp <- Box.test(residuals(arimaMod), lag = 24, type = "Box-Pierce")
bp
##
## Box-Pierce test
##
## data: residuals(arimaMod)
## X-squared = 18.55, df = 24, p-value = 0.7754
########################################################
#### Residuals diagnostics in forecasting
########################################################
res.fr <- residuals(arimaMod.Fr)
par(mfrow=c(1,3))
plot(res.fr, main="Residuals from ARIMA method",
ylab="", xlab="Years")
Acf(res.fr, main="ACF of residuals")
u <- residuals(arimaMod)
m<-mean(u)
std<-sqrt(var(u))
hist(u, breaks=20, col="gray", prob=TRUE,
xlab="Residuals", main="Histogram of residuals\n with Normal Curve")
curve(dnorm(x, mean=m, sd=std),
col="black", lwd=2, add=TRUE)
The ARIMA model had the lowest MAPE of all forecasting methods, and it is obvious from the plot that the prediction based on the ARIMA model detects both the trend and the seasonality of the series.
The residuals of the model are reasonably good, and the LjungBox test shows that there is no serial correlation.
The superiority of ARIMA over the other models of forecasting depends, in part, on ARIMA including a term for differerincing.
Indeed, the ‘AirPassengers’ series is not stationary, as it can be proved with the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test. The null hypothesis of the test is that the series is stationary, so a p < 0.05 (the conventional threshold for rejecting the null, whatever this means) indicates that the series is not stationary.
We will use the test as implemented in the ‘tseries’ package
library(tseries)
kpss.test(series)
## Warning: p-value smaller than printed p-value
##
## KPSS Test for Level Stationarity
##
## data: series
## KPSS Level = 4.342, Truncation lag parameter = 2, p-value = 0.01
The test p-value = 0.01, so we can assume that the series is not stationary. To have it stationary we should difference the time series. To evaluate how much we must difference the series, we can use the ‘ndiffs’, then apply the number produced by the function with the appropriate command.
ndiffs(series)
## [1] 1
series.diff <- diff(series, differences=1)
# you may automate the procedure assigning the result of 'ndiffs' to a vector and using the output for the required differences in 'diff'
# d <- ndiffs(series); series.diff <- diff(series, differences=d[1])
# check wether the differenced series is stationary
kpss.test(series.diff)
## Warning: p-value greater than printed p-value
##
## KPSS Test for Level Stationarity
##
## data: series.diff
## KPSS Level = 0.0115, Truncation lag parameter = 2, p-value = 0.1
# Now the series is stationary, as it can be seen by comparison with the original series
# The RColorBrewer can be used to create a set of diverging colors
library(RColorBrewer)
rb<-brewer.pal(7,"Blues")
# paired plot with enlarged window
windows(width=1200,height=350)
par(mfrow=c(1,2))
plot(series, col=rb[4], xlab = "Original Series", ylab="Passegners on airplanes")
plot(series.diff, col=rb[7], xlab = "Differenced Series", ylab="Passegners on airplanes")
Anyway, this was not the point of the exercise. I wanted to compare some method of forecasting and evaluate how to measure the accuracy of the models. Indeed, even differencing the series, the forecasting of the other methods remain poor, with the seasonal naïve method being the closest to the target.
I hope you’ve enjoyed this!