Exercise 3.1

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

usnetelec

lambda <- BoxCox.lambda(usnetelec)
#> 0.5167714
autoplot(usnetelec)

autoplot(BoxCox(usnetelec,lambda))

usgdp

lambda <- BoxCox.lambda(usgdp)
#> 0.366352
autoplot(usgdp)

autoplot(BoxCox(usgdp,lambda))

mcopper

lambda <- BoxCox.lambda(mcopper)
#> 0.1919047
autoplot(mcopper)

autoplot(BoxCox(mcopper,lambda))

enplanements

lambda <- BoxCox.lambda(enplanements)
#> -0.2269461
autoplot(enplanements)

autoplot(BoxCox(enplanements,lambda))

Exercise 3.2

Why is a Box-Cox transformation unhelpful for the cangas data?

lambda <- BoxCox.lambda(cangas)
#> 0.5767759
autoplot(cangas)

autoplot(BoxCox(cangas,lambda))

It appears that there is no significant difference in the plots as the seasonal variation in the whole series did not improve.

Exercise 3.3

What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?

retaildata <- readxl::read_excel("retail.xlsx", skip=1)

#Chose col B Turnover New South Wales Supermarket and grocery stores
myts <- ts(retaildata[,"A3349335T"], 
  frequency=12, start=c(1982,4))

lambda <- BoxCox.lambda(myts)
#> 0.193853
autoplot(myts)

autoplot(BoxCox(myts,lambda))

Visually, there appears an improvement of variation in the whole series.

Exercise 3.8

For your retail time series (from Exercise 3 in Section 2.10):

Split the data into two parts using

myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)

Check that your data have been split appropriately by producing the following plot.

autoplot(myts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test")

Calculate forecasts using snaive applied to myts.train.

fc <- snaive(myts.train)

Compare the accuracy of your forecasts against the actual values stored in myts.test.

accuracy(fc,myts.test)

##                    ME      RMSE       MAE      MPE     MAPE     MASE      ACF1
## Training set 61.56787  72.20702  61.68438 6.388722 6.404105 1.000000 0.6018274
## Test set     97.44583 109.62545 100.02917 4.629852 4.751209 1.621629 0.2686595
##              Theil's U
## Training set        NA
## Test set     0.9036205

Check the residuals.

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 812.76, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Do the residuals appear to be uncorrelated and normally distributed?

The residuals appear correlated, appear quite normally distributed, but some right tail, mean is not zero-centered which suggests bias

How sensitive are the accuracy measures to the training/test split?

To help answer this question, below is a little different training/test split to compare with the above. The accuracy measures appears to be sensitive to the split.

myts.train_2 <- window(myts, end=c(2008,12))
myts.test_2 <- window(myts, start=2009)
fc_2 <- snaive(myts.train_2)
accuracy(fc_2,myts.test_2)

##                     ME      RMSE       MAE      MPE     MAPE     MASE      ACF1
## Training set  58.79579  68.82721  58.92136 6.506662 6.523239 1.000000 0.6157221
## Test set     151.04167 165.81408 151.04167 7.480066 7.480066 2.563445 0.5142346
##              Theil's U
## Training set        NA
## Test set      1.273142

Predictive Analytics - Homework 2

Luisa Velasco

2/16/2020

Exercise 3.1

Exercise 3.2

Exercise 3.3

Exercise 3.8