Assignment 2

3.1

For the following series, find an appropriate Box-Cox transformation in order to stabilize the variance.

  • usnetelec
  • usgdp
  • mcopper
  • enplanements
(usnetelec_lb <- BoxCox.lambda(usnetelec))
## [1] 0.5167714
autoplot(BoxCox(usnetelec,usnetelec_lb))

(usgdp_lb <- BoxCox.lambda(usgdp))
## [1] 0.366352
autoplot(BoxCox(usgdp,usgdp_lb))

(mcopper_lb <- BoxCox.lambda(mcopper))
## [1] 0.1919047
autoplot(BoxCox(mcopper,mcopper_lb))

(enplanements_lb <- BoxCox.lambda(enplanements))
## [1] -0.2269461
autoplot(BoxCox(enplanements,enplanements_lb))

3.2

Why is a Box-Cox transformation unhelpful for the cangas data?

(cangas_lb <- BoxCox.lambda(cangas))
## [1] 0.5767759
autoplot(BoxCox(cangas,cangas_lb))

autoplot(cangas)

  • The transformation does not seem to have effect due to the dependency on lambda. A good lambda value makes the size of the seasonal variation about the same across the series.
  • Seasonality, while it exists in this data, is too varying in value to have a good lambda.

3.3

What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?

retaildata <- read_excel('retail.xlsx', skip=1)
myts <- ts(retaildata[,"A3349640L"],frequency=12, start=c(1982,4))
(myts_lb <- BoxCox.lambda(myts))
## [1] 0.1311965
autoplot(BoxCox(myts,myts_lb))

autoplot(myts)

  • It does appear the BoxCox transformation had effect on the data.
forecast1 <- rwf(myts, drift =TRUE, lambda = myts_lb, h=50, level=80)
forecast2 <- rwf(myts, drift =TRUE, lambda = myts_lb, h=50, level=80, biasadj = TRUE)
## Warning in `/.default`(0.5 * fvar * (1 - lambda), (out)^(2 * lambda)): Recycling array of length 1 in array-vector arithmetic is deprecated.
##   Use c() or as.vector() instead.
autoplot(myts) +
  autolayer(forecast1, series= 'Simple Back Transformation') +
  autolayer(forecast2, series= 'Bias Adjusted', PI=FALSE) +
  guides(color =guide_legend(title = 'Model Forecast'))

  • In this instance, I think the simple back transformation matches the flow of the data. The bias adjusted values rise too quickly when looking at the previous values.

3.8

For your retail time series (from Exercise 3 in Section 2.10):

a. Split the data into two parts using:
myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)
b. Check that your data have been split appropriately by producing the following plot.
autoplot(myts) +
autolayer(myts.train, series="Training") +
autolayer(myts.test, series="Test")

##### c. Calculate forecasts using snaive applied to myts.train.

fc <- snaive(myts.train)
d. Compare the accuracy of your forecasts against the actual values stored in myts.test.
accuracy(fc,myts.test)
##                     ME     RMSE      MAE       MPE      MAPE     MASE
## Training set  13.10781 29.87236 20.66937  7.478844 11.485020 1.000000
## Test set     -28.33750 36.99343 29.92083 -7.630292  8.058562 1.447593
##                   ACF1 Theil's U
## Training set 0.8224883        NA
## Test set     0.4432892  1.292012
e. Check the residuals.

Do the residuals appear to be uncorrelated and normally distributed?

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 1011.9, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24
  • A Q value of 1011.9 suggests auto correlations.
  • The Histogram does appear to have a bit of a right skew
  • The ACF plot does appear to show the start of seasonality
f. How sensitive are the accuracy measures to the training/test split?
mean1 <- meanf(myts.train,h=50)
forecast1 <- rwf(myts.train, h=50)
forecast2 <- rwf(myts.train, drift =TRUE, h=50)

autoplot(myts.train) +
autolayer(mean1, series="Mean") +
autolayer(forecast1, series="Naive")+
autolayer(forecast2, series = "Drift")+
guides(colors=guide_legend(title = "Forecasts"))

  • It appears the mean error and MPE are highly sensitive
  • The other values are not as sensitive or show a wide range between the values.
  • The Drift model does appear to be closer now than the other values.