Data 624 Predictive Analytics HW2

The forecaster’s toolbox

3.7 Exercises

library(tidyr)
library(dplyr)
library(knitr)
library(utils)
library(ggplot2)
library(forecast)
library(fpp2)
library(readxl)

3.1

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

- usnetelec

- usgdp

- mcopper

- enplanements

lambda = BoxCox.lambda(usnetelec)
autoplot(BoxCox(usnetelec, lambda)) +
    xlab(paste("lambda=", lambda))

lambda = BoxCox.lambda(usgdp)
autoplot(BoxCox(usgdp, lambda)) +
    xlab(paste("lambda=", lambda))

lambda = BoxCox.lambda(mcopper)
autoplot(BoxCox(mcopper, lambda)) +
    xlab(paste("lambda=", lambda))

lambda = BoxCox.lambda(enplanements)
autoplot(BoxCox(enplanements, lambda)) +
    xlab(paste("lambda=", lambda))

3.2

Why is a Box-Cox transformation unhelpful for the cangas data?

lambda = BoxCox.lambda(cangas)
autoplot(BoxCox(cangas, lambda)) +
    xlab(paste("lambda=", lambda))

In the case of the cangas data, even after applying a Box-Cox transformation, we see from the graph above that the variance is not normally distributed. Thus the purpose of using the transformation is not achieved.

3.3

What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?

retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349873A"], frequency=12, start=c(1982,4))
plot(myts)

lambda = BoxCox.lambda(myts)
autoplot(BoxCox(myts, lambda)) +
    xlab(paste("lambda=", lambda))

3.8

For your retail time series (from Exercise 3 in Section 2.10):

a. Split the data into two parts using

myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)

Check that your data have been split appropriately by producing the following plot.

autoplot(myts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test")

Calculate forecasts using snaive applied to myts.train.

fc <- snaive(myts.train)

Compare the accuracy of your forecasts against the actual values stored in myts.test.

accuracy(fc,myts.test)

##                     ME     RMSE      MAE       MPE      MAPE     MASE
## Training set  7.772973 20.24576 15.95676  4.702754  8.109777 1.000000
## Test set     55.300000 71.44309 55.78333 14.900996 15.082019 3.495907
##                   ACF1 Theil's U
## Training set 0.7385090        NA
## Test set     0.5315239  1.297866

Check the residuals.

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 624.45, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Do the residuals appear to be uncorrelated and normally distributed?

Since the p-value (<2.2e-16) is less than 0.05 the residuals are not independent. The residuals do not appear to be normally distributed.

3.8f

How sensitive are the accuracy measures to the training/test split?

We can do a different train/test split to check its effect on the accuracy.

myts.train2 <- window(myts, end=c(2005,12))
myts.test2 <- window(myts, start=2006)
fc <- snaive(myts.train2)
accuracy(fc,myts.test2)

##                     ME     RMSE      MAE      MPE     MAPE      MASE
## Training set  9.365568 20.04803 15.85714 5.745128 8.661106 1.0000000
## Test set     10.337500 19.09667 13.87083 3.151397 4.490635 0.8747372
##                   ACF1 Theil's U
## Training set 0.7202258        NA
## Test set     0.4852215  0.404645

Comparing accuracies for different train/test splits, we see that the accuracy of the test set changed significantly when the split was modified. There is thus a high degree of sensitivity to the split.

Data 624 Predictive Analytics HW2

Vikas Sinha

17 Feb 2019

The forecaster’s toolbox

3.7 Exercises

3.1

3.2

3.3

3.8

3.8f