Exercise 3.1

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

autoplot(usnetelec)

(lambda <- BoxCox.lambda(usnetelec))
## [1] 0.5167714
autoplot(BoxCox(usnetelec,lambda))

autoplot(usgdp)

(lambda <- BoxCox.lambda(usgdp))
## [1] 0.366352
autoplot(BoxCox(usgdp,lambda))

autoplot(mcopper)

(lambda <- BoxCox.lambda(mcopper))
## [1] 0.1919047
autoplot(BoxCox(mcopper,lambda))

autoplot(enplanements)

(lambda <- BoxCox.lambda(enplanements))
## [1] -0.2269461
autoplot(BoxCox(enplanements,lambda))

Exercise 3.2

Why is a Box-Cox transformation unhelpful for the cangas data?

autoplot(cangas)

(lambda <- BoxCox.lambda(cangas))
## [1] 0.5767759
autoplot(BoxCox(cangas,lambda))

Answer:

The autoplot before and after the transformation are the same. According to section 3.2 in our textbook: “If the data show variation that increases or decreases with the level of the series, then a transformation can be useful.” Based on the plot - the cangas dataset show variation that increases and then decreases in the more recent years.

Exercise 3.3

What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?

#setwd("/Users/elinaazrilyan/Documents/Data624/")
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349882C"],
  frequency=12, start=c(1982,4))
autoplot(myts)

(lambda <- BoxCox.lambda(myts))
## [1] 0.2324297
autoplot(BoxCox(myts,lambda))

Exercise 3.8

For your retail time series (from Exercise 3 in Section 2.10):

a) Split the data into two parts using:
myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)
b) Check that your data have been split appropriately by producing the following plot.
autoplot(myts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test")

c) Calculate forecasts using snaive applied to myts.train.
fc <- snaive(myts.train)
d) Compare the accuracy of your forecasts against the actual values stored in myts.test.
accuracy(fc,myts.test)
##                    ME     RMSE      MAE      MPE     MAPE     MASE
## Training set 46.37387 59.27744 47.03213 7.755343 7.848159 1.000000
## Test set     67.42917 76.67352 67.42917 4.447568 4.447568 1.433683
##                   ACF1 Theil's U
## Training set 0.8279637        NA
## Test set     0.6327744  1.004278
e) Check the residuals.
checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 2099.4, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

The mean of the residuals is not close to zero. ACF of the residuals from the seasonal naïve method shows a correlation suggesting our forecast could be improved. The histogram suggests that the residuals are not normal — the distribution is right-skewed. Also, hte Ljung-Box test shows a low p-value indicating that the residuals are significantly different from a white noise series.

f) How sensitive are the accuracy measures to the training/test split?
Answer:

The accuracy measures will be very sensitive to the training/test split here since there is an increase in variation of our data as obvious from the plot below. We would be able to make a better model with the higher % of the data in the train set but we won’t be albe to evaluate and use a model which has a very small test set since our test set needs to be at least as large as the maximum forecast horizon required. A time series cross-validation method would yield better results.

autoplot(myts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test")