Data624 The forecaster’s toolbox Assignment2

Chapter 3:

suppressMessages(suppressWarnings(library(fpp2)))
suppressMessages(suppressWarnings(library(readxl)))

3.1 For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

usnetelec:

lmd = BoxCox.lambda(usnetelec)
lmd  
## [1] 0.5167714
usnetelec.trans = BoxCox(usnetelec,lmd)
combined = cbind(usnetelec,usnetelec.trans)
autoplot(combined,facet=T) + xlab("Year") + ggtitle("usnetelec")

usgdp:

lmd = BoxCox.lambda(usgdp)
lmd  
## [1] 0.366352
usgdp.trans = BoxCox(usgdp,lmd)
combined = cbind(usgdp,usgdp.trans)
autoplot(combined,facet=T) + xlab("Year") + ggtitle("usgdp")

mcopper:

lmd = BoxCox.lambda(mcopper)
lmd  
## [1] 0.1919047
mcopper.trans = BoxCox(mcopper,lmd)
combined = cbind(mcopper,mcopper.trans)
autoplot(combined,facet=T) + xlab("Year") + ggtitle("mcopper")

enplanements:

lmd = BoxCox.lambda(enplanements)
lmd  
## [1] -0.2269461
enplanements.trans = BoxCox(enplanements,lmd)
combined = cbind(enplanements,enplanements.trans)
autoplot(combined,facet=T) + xlab("Year") + ggtitle("enplanements")

3.2 Why is a Box-Cox transformation unhelpful for the cangas data?

To answer this lets first autoplot the cangas data.

cangas:

lmd = BoxCox.lambda(cangas)
lmd  
## [1] 0.5767759
cangas.trans = BoxCox(cangas,lmd)
combined = cbind(cangas,cangas.trans)
autoplot(combined,facet=T) + xlab("Year") + ggtitle("cangas")

This time series is monthly Canadian gas production, in billions of cubic metres, January 1960 - February 2005. There is not much variations in the plots. So transformation is not always needed.

3.3 What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?

Retails data:

retaildata <- readxl::read_excel("C:/Users/rites/Documents/GitHub/Data624_Assignment1/retail.xlsx", skip=1)
## readxl works best with a newer version of the tibble package.
## You currently have tibble v1.4.2.
## Falling back to column name repair from tibble <= v1.4.2.
## Message displays once per session.
myts <- ts(retaildata[,"A3349873A"],frequency=12, start=c(1982,4))
lmd = BoxCox.lambda(myts)
lmd  
## [1] 0.1276369
myts.trans = BoxCox(myts,lmd)
combined = cbind(myts,myts.trans)
autoplot(combined,facet=T) + xlab("Year") + ggtitle("myts")

It would be good to choose Box-Cox Transformation with lambda = 0.1276369

3.8 For your retail time series (from Exercise 3 in Section 2.10):

a. Split the data into two parts using

myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)

b. Check that your data have been split appropriately by producing the following plot.

autoplot(myts) +
autolayer(myts.train, series="Training") +
autolayer(myts.test, series="Test")

c. Calculate forecasts using snaive applied to myts.train.

fc <- snaive(myts.train)

d. Compare the accuracy of your forecasts against the actual values stored in myts.test.

accuracy(fc,myts.test)
##                     ME     RMSE      MAE       MPE      MAPE     MASE
## Training set  7.772973 20.24576 15.95676  4.702754  8.109777 1.000000
## Test set     55.300000 71.44309 55.78333 14.900996 15.082019 3.495907
##                   ACF1 Theil's U
## Training set 0.7385090        NA
## Test set     0.5315239  1.297866

e. Check the residuals.

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 624.45, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Do the residuals appear to be uncorrelated and normally distributed?

Residuals are correlated with each other and not normally distributed

f. How sensitive are the accuracy measures to the training/test split?

Sensitivity is the ratio of the test set error to the train set error. Looking at the accuracy results, i feel accuracy measures are very sensitive to training/test split.