DATA624: Predictive Analytics: HW#12

Exercise 3.1

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

usnetelec

autoplot(usnetelec)

(lambda <- BoxCox.lambda(usnetelec))

## [1] 0.5167714

autoplot(BoxCox(usnetelec,lambda))

usgdp

autoplot(usgdp)

(lambda <- BoxCox.lambda(usgdp))

## [1] 0.366352

autoplot(BoxCox(usgdp,lambda))

mcopper

autoplot(mcopper)

(lambda <- BoxCox.lambda(mcopper))

## [1] 0.1919047

autoplot(BoxCox(mcopper,lambda))

enplanements

autoplot(enplanements)

(lambda <- BoxCox.lambda(enplanements))

## [1] -0.2269461

autoplot(BoxCox(enplanements,lambda))

Exercise 3.2

Why is a Box-Cox transformation unhelpful for the cangas data?

autoplot(cangas)

(lambda <- BoxCox.lambda(cangas))

## [1] 0.5767759

autoplot(BoxCox(cangas,lambda))

Answer:

The autoplot before and after the transformation are the same. According to section 3.2 in our textbook: “If the data show variation that increases or decreases with the level of the series, then a transformation can be useful.” Based on the plot - the cangas dataset show variation that increases and then decreases in the more recent years.

Exercise 3.3

What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?

#setwd("/Users/elinaazrilyan/Documents/Data624/")
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349882C"],
  frequency=12, start=c(1982,4))

autoplot(myts)

(lambda <- BoxCox.lambda(myts))

## [1] 0.2324297

autoplot(BoxCox(myts,lambda))

Exercise 3.8

For your retail time series (from Exercise 3 in Section 2.10):

a) Split the data into two parts using:

myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)

b) Check that your data have been split appropriately by producing the following plot.

autoplot(myts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test")

c) Calculate forecasts using snaive applied to myts.train.

fc <- snaive(myts.train)

d) Compare the accuracy of your forecasts against the actual values stored in myts.test.

accuracy(fc,myts.test)

##                    ME     RMSE      MAE      MPE     MAPE     MASE
## Training set 46.37387 59.27744 47.03213 7.755343 7.848159 1.000000
## Test set     67.42917 76.67352 67.42917 4.447568 4.447568 1.433683
##                   ACF1 Theil's U
## Training set 0.8279637        NA
## Test set     0.6327744  1.004278

e) Check the residuals.

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 2099.4, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

The mean of the residuals is not close to zero. ACF of the residuals from the seasonal naïve method shows a correlation suggesting our forecast could be improved. The histogram suggests that the residuals are not normal — the distribution is right-skewed. Also, hte Ljung-Box test shows a low p-value indicating that the residuals are significantly different from a white noise series.

f) How sensitive are the accuracy measures to the training/test split?

Answer:

The accuracy measures will be very sensitive to the training/test split here since there is an increase in variation of our data as obvious from the plot below. We would be able to make a better model with the higher % of the data in the train set but we won’t be albe to evaluate and use a model which has a very small test set since our test set needs to be at least as large as the maximum forecast horizon required. A time series cross-validation method would yield better results.

autoplot(myts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test")

DATA624: Predictive Analytics: HW#12

Elina Azrilyan

September 08, 2020

Exercise 3.1

Exercise 3.2

Answer:

Exercise 3.3

Exercise 3.8

a) Split the data into two parts using:

b) Check that your data have been split appropriately by producing the following plot.

c) Calculate forecasts using snaive applied to myts.train.

d) Compare the accuracy of your forecasts against the actual values stored in myts.test.

e) Check the residuals.

f) How sensitive are the accuracy measures to the training/test split?

Answer: