library(tidyr)
library(dplyr)
library(knitr)
library(utils)
library(ggplot2)
library(forecast)
library(fpp2)
library(readxl)
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.
- usnetelec
- usgdp
- mcopper
- enplanements
lambda = BoxCox.lambda(usnetelec)
autoplot(BoxCox(usnetelec, lambda)) +
xlab(paste("lambda=", lambda))
lambda = BoxCox.lambda(usgdp)
autoplot(BoxCox(usgdp, lambda)) +
xlab(paste("lambda=", lambda))
lambda = BoxCox.lambda(mcopper)
autoplot(BoxCox(mcopper, lambda)) +
xlab(paste("lambda=", lambda))
lambda = BoxCox.lambda(enplanements)
autoplot(BoxCox(enplanements, lambda)) +
xlab(paste("lambda=", lambda))
Why is a Box-Cox transformation unhelpful for the cangas data?
lambda = BoxCox.lambda(cangas)
autoplot(BoxCox(cangas, lambda)) +
xlab(paste("lambda=", lambda))
In the case of the cangas data, even after applying a Box-Cox transformation, we see from the graph above that the variance is not normally distributed. Thus the purpose of using the transformation is not achieved.
What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349873A"], frequency=12, start=c(1982,4))
plot(myts)
lambda = BoxCox.lambda(myts)
autoplot(BoxCox(myts, lambda)) +
xlab(paste("lambda=", lambda))
For your retail time series (from Exercise 3 in Section 2.10):
a. Split the data into two parts using
myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)
Check that your data have been split appropriately by producing the following plot.
autoplot(myts) +
autolayer(myts.train, series="Training") +
autolayer(myts.test, series="Test")
Calculate forecasts using snaive applied to myts.train.
fc <- snaive(myts.train)
Compare the accuracy of your forecasts against the actual values stored in myts.test.
accuracy(fc,myts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 7.772973 20.24576 15.95676 4.702754 8.109777 1.000000
## Test set 55.300000 71.44309 55.78333 14.900996 15.082019 3.495907
## ACF1 Theil's U
## Training set 0.7385090 NA
## Test set 0.5315239 1.297866
Check the residuals.
checkresiduals(fc)
##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 624.45, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Do the residuals appear to be uncorrelated and normally distributed?
Since the p-value (<2.2e-16) is less than 0.05 the residuals are not independent. The residuals do not appear to be normally distributed.
How sensitive are the accuracy measures to the training/test split?
We can do a different train/test split to check its effect on the accuracy.
myts.train2 <- window(myts, end=c(2005,12))
myts.test2 <- window(myts, start=2006)
fc <- snaive(myts.train2)
accuracy(fc,myts.test2)
## ME RMSE MAE MPE MAPE MASE
## Training set 9.365568 20.04803 15.85714 5.745128 8.661106 1.0000000
## Test set 10.337500 19.09667 13.87083 3.151397 4.490635 0.8747372
## ACF1 Theil's U
## Training set 0.7202258 NA
## Test set 0.4852215 0.404645
Comparing accuracies for different train/test splits, we see that the accuracy of the test set changed significantly when the split was modified. There is thus a high degree of sensitivity to the split.