Chapter 3 - The forecaster’s toolbox

For the following series, find an appropriate Box-Cox transformation in order to stabilize the variance.

* usnetelec
* usgdp
* mcopper
* enplanements

usnetelec

cat("Lambda for usnetelec = ", BoxCox.lambda(usnetelec))

## Lambda for usnetelec =  0.5167714

autoplot(BoxCox(usnetelec, BoxCox.lambda(usnetelec)))

usgdp

cat("Lambda for usgdp = ", BoxCox.lambda(usgdp))

## Lambda for usgdp =  0.366352

autoplot(BoxCox(usgdp, BoxCox.lambda(usgdp)))

mcopper

cat("Lambda for mcopper = ", BoxCox.lambda(mcopper))

## Lambda for mcopper =  0.1919047

autoplot(BoxCox(mcopper, BoxCox.lambda(mcopper)))

enplanements

cat("Lambda for enplanements = ", BoxCox.lambda(enplanements))

## Lambda for enplanements =  -0.2269461

autoplot(BoxCox(enplanements, BoxCox.lambda(enplanements)))

Why is a Box-Cox transformation unhelpful for the cangas data?

autoplot(cangas)

autoplot(BoxCox(cangas, BoxCox.lambda(cangas)))

It looks like that Box-Cox transformation doesn’t yield simpler model

What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?

retail_data <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retail_data[,"A3349873A"], frequency=12, start=c(1982,4))
lambda_retail <- BoxCox.lambda(myts)
cat("selected lambda:", lambda_retail)

## selected lambda: 0.1276369

fc_retail <- rwf(myts, drift = TRUE, lambda = lambda_retail, h = 50, level = 80)
fc_retail_biasadj <- rwf(myts, drift = TRUE, lambda = lambda_retail, h = 50, level = 80, biasadj = TRUE)

autoplot(myts) +
  autolayer(fc_retail, series = "Drift method with Box-Cox Transformation") +
  autolayer(fc_retail_biasadj$mean, series = "Bias Adjusted") +
  guides(colour = guide_legend(title = "Forecast"))

Better with lambda = 0.128, bias adjusted Box-Cox Transformation

For your retail time series (from Exercise 3 in Section 2.10):

Split the data into two parts using

myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)

Check that your data have been split appropriately by producing the following plot.

autoplot(myts) +
autolayer(myts.train, series="Training") +
autolayer(myts.test, series="Test")

Calculate forecasts using snaive applied to myts.train.

fc <- snaive(myts.train)

Compare the accuracy of your forecasts against the actual values stored in myts.test.

accuracy(fc,myts.test)

Check the residuals.

checkresiduals(fc)
Do the residuals appear to be uncorrelated and normally distributed?

How sensitive are the accuracy measures to the training/test split?

myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)

autoplot(myts) + 
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test")

fc <- snaive(myts.train)

accuracy(fc, myts.test)

##                     ME     RMSE      MAE       MPE      MAPE     MASE
## Training set  7.772973 20.24576 15.95676  4.702754  8.109777 1.000000
## Test set     55.300000 71.44309 55.78333 14.900996 15.082019 3.495907
##                   ACF1 Theil's U
## Training set 0.7385090        NA
## Test set     0.5315239  1.297866

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 624.45, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

residuals seem correlated with each other
but it doesn’t look normally distributed

sensitivity is within the ratio of the test set error and the train set error.
It looks like the Mean Error is highly sensitive by definition
RMSE, MAE, MPE, MASE are sensitive and MAPE and ACF1 aren’t much sensitive.

Data 624 - Homework 2

Ohannes (Hovig) Ohannessian

2/11/2019

Chapter 3 - The forecaster’s toolbox