Data 624 Homework 2

library(fpp2)

## -- Attaching packages ------------------------------------------------------------------------------------------------------------------------------------------------------ fpp2 2.4 --

## v ggplot2   3.1.0     v fma       2.4  
## v forecast  8.12      v expsmooth 2.3

##

library(gridExtra)

(3.1)

For the following series, find an appropriate Box-Cox transformation in order to stabilize the variance.

usnetelec
usgdp
mcopper
enplanements

#help(usnetelec)
#help(usgdp)
#help(mcopper)
#help(enplanements)

usnetelec

Annual US net electricity generation (billion kwh) for 1949-2003

Use BoxCox.lambda function to determine best lambda value that makes size of variation about the same across the whole series. Lambda 0.5167714 is the best value as per output of function.

(lambda1 <- BoxCox.lambda(usnetelec))

## [1] 0.5167714

Plot of the Box-Cox transformation with lambda 0.5167714

plot1 <- autoplot(usnetelec)
plot2 <- autoplot(BoxCox(usnetelec,lambda1))
grid.arrange(plot1, plot2)

usgdp

Quarterly US GDP. 1947:1 - 2006.1.

Use BoxCox.lambda function to determine best lambda value that makes size of variation about the same across the whole series. Lambda 0.366352 is the best value as per output of function.

(lambda2 <- BoxCox.lambda(usgdp))

## [1] 0.366352

Plot of the Box-Cox transformation with lambda 0.366352

plot1 <- autoplot(usgdp)
plot2 <- autoplot(BoxCox(usgdp,lambda2))
grid.arrange(plot1, plot2)

mcopper

Monthly copper prices.

Use BoxCox.lambda function to determine best lambda value that makes size of variation about the same across the whole series. Lambda 0.1919047 is the best value as per output of function.

(lambda3 <- BoxCox.lambda(mcopper))

## [1] 0.1919047

Plot of the Box-Cox transformation with lambda 0.1919047

plot1 <- autoplot(mcopper)
plot2 <- autoplot(BoxCox(mcopper,lambda3))
grid.arrange(plot1, plot2)

enplanements

Monthly US Domestic Revenue Enplanements (millions): 1996-2000

Use BoxCox.lambda function to determine best lambda value that makes size of variation about the same across the whole series. Lambda -0.2269461 is the best value as per output of function.

(lambda4 <- BoxCox.lambda(enplanements))

## [1] -0.2269461

Plot of the Box-Cox transformation with lambda -0.2269461

plot1 <- autoplot(enplanements)
plot2 <- autoplot(BoxCox(enplanements,lambda4))
grid.arrange(plot1, plot2)

(3.2)

Why is a Box-Cox transformation unhelpful for the cangas data?

help(cangas)

## starting httpd help server ... done

cangas

Monthly Canadian gas production, billions of cubic metres, January 1960 - February 2005

Box-Cox transformation using the best lambda to stabilize variance Value of lambda is ’0.5767759`

(lambda <- BoxCox.lambda(cangas))

## [1] 0.5767759

cangas_plot <- autoplot(cangas)
cangas_boxcox <- autoplot(BoxCox(cangas,lambda))
grid.arrange(cangas_plot, cangas_boxcox)

The resulting plot of the Box-cox transformation does not look much different from the original plot.

(3.3)

What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?

retaildata <- readxl::read_excel("retail.xlsx", skip=1)

myts <- ts(retaildata[,"A3349335T"], frequency=12, start=c(1982,4))

Box-cox transformation with lambda 0.193853

(lambda <- BoxCox.lambda(myts))

## [1] 0.193853

plot1 <- autoplot(myts)
plot2 <- autoplot(BoxCox(myts, lambda))
grid.arrange(plot1, plot2)

The transformation improved the variance across the time series so that it is more similar

(3.8)

For your retail time series (from Exercise 3 in Section 2.10):

Split the data into two parts using

myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)

Check that your data have been split appropriately by producing the following plot.

autoplot(myts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test")

Calculate forecasts using snaive applied to myts.train.

fc <- snaive(myts.train)

Compare the accuracy of your forecasts against the actual values stored in myts.test.

accuracy(fc,myts.test)

##                    ME      RMSE       MAE      MPE     MAPE     MASE
## Training set 61.56787  72.20702  61.68438 6.388722 6.404105 1.000000
## Test set     97.44583 109.62545 100.02917 4.629852 4.751209 1.621629
##                   ACF1 Theil's U
## Training set 0.6018274        NA
## Test set     0.2686595 0.9036205

Check the residuals.

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 812.76, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

There appears to be some correlation in the residuals, which indicates that “there is information left in the residuals which should be used in computing forecasts”. The Ljung-Box test p-value is < 2.2e-16, which tells us differences in residuals is not likely to be white noise.

How sensitive are the accuracy measures to the training/test split?

To investigate this, we changed the split. The accuracy measures are different for each of the splits. So, I would say that training/test split are sensitive to accuracy measures.

New split:

myts.train2 <- window(myts, end=c(2011,12))
myts.test2 <- window(myts, start=2012)
fc2 <- snaive(myts.train2)
accuracy(fc2,myts.test2)

##                    ME     RMSE      MAE      MPE     MAPE     MASE
## Training set 62.05884 72.75961 62.35101 6.293351 6.316641 1.000000
## Test set     80.76250 97.02262 82.36250 3.634259 3.709092 1.320949
##                   ACF1 Theil's U
## Training set 0.5866573        NA
## Test set     0.2515324  0.707386

Original split:

accuracy(fc,myts.test)

##                    ME      RMSE       MAE      MPE     MAPE     MASE
## Training set 61.56787  72.20702  61.68438 6.388722 6.404105 1.000000
## Test set     97.44583 109.62545 100.02917 4.629852 4.751209 1.621629
##                   ACF1 Theil's U
## Training set 0.6018274        NA
## Test set     0.2686595 0.9036205