Data 624 Assignment Two - Hyndman Forecasting Chapter 3

Corey Arnotus

2020-02-16

Load Libraries

library(forecast)
## Registered S3 method overwritten by 'xts':
##   method     from
##   as.zoo.xts zoo
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## Registered S3 methods overwritten by 'forecast':
##   method             from    
##   fitted.fracdiff    fracdiff
##   residuals.fracdiff fracdiff
library(fpp2)
## Loading required package: ggplot2
## Loading required package: fma
## Loading required package: expsmooth

Question 3.1

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

usnetelec

(lambda1 <- BoxCox.lambda(usnetelec))
## [1] 0.5167714
autoplot(usnetelec)

autoplot(BoxCox(usnetelec,lambda1))

## usgdp

(lambda1 <- BoxCox.lambda(usgdp))
## [1] 0.366352
autoplot(usgdp)

autoplot(BoxCox(usgdp,lambda1))

## mcopper

(lambda1 <- BoxCox.lambda(mcopper))
## [1] 0.1919047
autoplot(mcopper)

autoplot(BoxCox(mcopper,lambda1))

## enplanements

(lambda1 <- BoxCox.lambda(enplanements))
## [1] -0.2269461
autoplot(enplanements)

autoplot(BoxCox(enplanements,lambda1))

# Question 3.2 ## Why is a Box-Cox transformation unhelpful for the cangas data? The box cox is unable to smooth out the seasonal data for the cangas data set becausse the variance increases and then decreases. The data looks exactly the same before and after the box cox transformation

(lambda1 <- BoxCox.lambda(cangas))
## [1] 0.5767759
autoplot(cangas)

autoplot(BoxCox(cangas,lambda1))

Question 3.3

The boxcox.lambda function chose a value of .1276369 so this is the transformation I would select for the retail data

retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349873A"],
  frequency=12, start=c(1982,4))

(lambda6 <- BoxCox.lambda(myts))
## [1] 0.1276369
autoplot(BoxCox(myts,lambda6))

Question 3.8

a. Split the data into two parts using

myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)

b.Check that your data have been split appropriately by producing the following plot

autoplot(myts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test")

c. Calculate forecasts using snaive applied to myts.train.

fc <- snaive(myts.train)

d.Compare the accuracy of your forecasts against the actual values stored in myts.test

accuracy(fc,myts.test)
##                     ME     RMSE      MAE       MPE      MAPE     MASE      ACF1
## Training set  7.772973 20.24576 15.95676  4.702754  8.109777 1.000000 0.7385090
## Test set     55.300000 71.44309 55.78333 14.900996 15.082019 3.495907 0.5315239
##              Theil's U
## Training set        NA
## Test set      1.297866

e.Check the residuals.

Do the residuals appear to be uncorrelated and normally distributed? They do appear to be normally distributed however with a slight positve skew. The residuals do no appear to be uncorrelated the lag function actually is showing patterns that would suggest they are correlated will latter lags having smaller and smaller and then negative correlations.

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 624.45, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

f.How sensitive are the accuracy measures to the training/test split?

The errors for the test and train set are significant. The test set has larger errors that the training set for the mean error, root mean square error, mean absolute error. It would appear that the model is not generalizing well to the testing data