DATA624 Homework 2

library(tidyverse)
library(fpp2)
library(gridExtra)

3.1

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

usnetelec
usgdp
mcopper
enplanements

usnetelec

lambda <- BoxCox.lambda(usnetelec)
lambda

## [1] 0.5167714

original <- autoplot(usnetelec) + ggtitle("Original")
transformed <- autoplot(BoxCox(usnetelec,lambda)) + ggtitle("Box-Cox Transformation")
grid.arrange(original, transformed, nrow = 1)

usgdp

lambda <- BoxCox.lambda(usgdp)
lambda

## [1] 0.366352

original <- autoplot(usgdp) + ggtitle("Original")
transformed <- autoplot(BoxCox(usgdp,lambda)) + ggtitle("Box-Cox Transformation")
grid.arrange(original, transformed, nrow = 1)

mcopper

lambda <- BoxCox.lambda(mcopper)
lambda

## [1] 0.1919047

original <- autoplot(mcopper) + ggtitle("Original")
transformed <- autoplot(BoxCox(mcopper,lambda)) + ggtitle("Box-Cox Transformation")
grid.arrange(original, transformed, nrow = 1)

enplanements

lambda <- BoxCox.lambda(enplanements)
lambda

## [1] -0.2269461

original <- autoplot(enplanements) + ggtitle("Original")
transformed <- autoplot(BoxCox(enplanements,lambda)) + ggtitle("Box-Cox Transformation")
grid.arrange(original, transformed, nrow = 1)

3.2

Why is a Box-Cox transformation unhelpful for the cangas data?

lambda <- BoxCox.lambda(cangas)
lambda

## [1] 0.5767759

original <- autoplot(cangas) + ggtitle("Original")
transformed <- autoplot(BoxCox(cangas,lambda)) + ggtitle("Box-Cox Transformation")
grid.arrange(original, transformed, nrow = 1)

Even after the transformation with the best \(\lambda = 0.5767759\) chosen, the variance is not constant or did not really change. In other words, the seasonal variation is not the same across the series which makes it harder to exlain.

3.3

What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)

retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349399C"], frequency=12, start=c(1982,4)) #clothing

autoplot(myts) + xlab('Year') + ylab('Turnover: Clothing')

lambda <- BoxCox.lambda(myts)
lambda

## [1] 0.02074707

autoplot(BoxCox(myts, lambda))

With \(\lambda\) at 0.02074707, the variance is constant which makes the forecasting model simpler.

3.8

For your retail time series (from Exercise 3 in Section 2.10):

Split the data into two parts using

myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)

Check that your data have been split appropriately by producing the following plot.

autoplot(myts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test")

Calculate forecasts using snaive applied to myts.train

fc <- snaive(myts.train)

Compare the accuracy of your forecasts against the actual values stored in myts.test.

accuracy(fc,myts.test)

##                     ME     RMSE      MAE      MPE     MAPE     MASE
## Training set  9.007207 21.13832 16.58859 4.224080 7.494415 1.000000
## Test set     10.362500 21.50499 18.99583 2.771495 5.493632 1.145115
##                   ACF1 Theil's U
## Training set 0.5277855        NA
## Test set     0.7420700 0.3223094

Check the residuals.

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 342, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Do the residuals appear to be uncorrelated and normally distributed?

The residuals appear to be correlated.

Reason: Not a white series.

According to the text, large values of
\(Q^*\) or the Ljung-Box test suggest that the autocorrelations do not come from a white noise series which in our case is true since the \(Q^*\) is large with value 342.
For the \(Q^*\), the result is significant therefore the residuals are from a white series.
The regression above is therefore nugatory although it can be improved. There are lots of areas that need to be explored.

The residuals are not normally distributed. The residuals are not centered around 0 as there is a longer left tail.

How sensitive are the accuracy measures to the training/test split?

fit1 <- meanf(myts.train, h=24)
fit2 <- rwf(myts.train,h=24)
fit3 <- snaive(myts.train,h=24)
autoplot(myts) +
  autolayer(fit1, series="Mean", PI=FALSE) +
  autolayer(fit2, series="Naïve", PI=FALSE) +
  autolayer(fit3, series="Seasonal naïve", PI=FALSE) +
  xlab("Year") + ylab("Turnover") +
  ggtitle("Forecasts for Retail Turnover on Clothing") +
  guides(colour=guide_legend(title="Forecast"))

accuracy(fit1, myts.test)

##                         ME     RMSE       MAE       MPE     MAPE     MASE
## Training set -3.688274e-15  84.8532  65.83623 -16.48880 36.09754 3.968766
## Test set      1.361463e+02 152.8584 136.14634  36.57186 36.57186 8.207229
##                   ACF1 Theil's U
## Training set 0.7766742        NA
## Test set     0.1762124  2.203431

accuracy(fit2, myts.test)

##                      ME      RMSE       MAE        MPE     MAPE      MASE
## Training set    1.29593  53.66712  34.48081  -1.712046 15.51772  2.078586
## Test set     -185.11250 197.72838 189.03750 -56.668479 57.35239 11.395635
##                    ACF1 Theil's U
## Training set -0.2404895        NA
## Test set      0.1762124  2.992736

accuracy(fit3, myts.test)

##                     ME     RMSE      MAE      MPE     MAPE     MASE
## Training set  9.007207 21.13832 16.58859 4.224080 7.494415 1.000000
## Test set     10.362500 21.50499 18.99583 2.771495 5.493632 1.145115
##                   ACF1 Theil's U
## Training set 0.5277855        NA
## Test set     0.7420700 0.3223094

Accuaracy measures are always sensitive to training/test splits. The seasonal naive method is the best of these three methods for this dataset.

DATA624 Homework 2

Javern Wilson

2/8/2020

3.1

usnetelec

usgdp

mcopper

enplanements

3.2

3.3

3.8