hw3

library(fpp2)
library(tidyverse)

BoxCox.lambda(usnetelec)

## [1] 0.5167714

BoxCox.lambda(usgdp)

## [1] 0.366352

BoxCox.lambda(mcopper)

## [1] 0.1919047

BoxCox.lambda(enplanements)

## [1] -0.2269461

l <- BoxCox.lambda(cangas)
cdf <- cbind(cangas, BoxCox(cangas, l))
autoplot(cdf, facet = TRUE)

The Box-Cox transformation is helpful for the cangas data because it helps make the size of the seasonal variation about the same across the whole series. This can make the forecasting model simpler and more accurate.

retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349873A"],  frequency=12, start=c(1982,4))
BoxCox.lambda(myts)

## [1] 0.1276369

myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)
autoplot(myts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test")

fc <- snaive(myts.train)
accuracy(fc,myts.test)

##                     ME     RMSE      MAE       MPE      MAPE     MASE
## Training set  7.772973 20.24576 15.95676  4.702754  8.109777 1.000000
## Test set     55.300000 71.44309 55.78333 14.900996 15.082019 3.495907
##                   ACF1 Theil's U
## Training set 0.7385090        NA
## Test set     0.5315239  1.297866

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 624.45, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Do the residuals appear to be uncorrelated and normally distributed?

The residuals appear to be correlated in the ACF. The small lags are large and positive and slowly decrease as lags increase. The observation is consistant with trending data. The Q* value is large and the p-value has reached significants indicating that we must reject that null hypothese and find that the residuals can be distiguished from white noise.

How sensitive are the accuracy measures to the training/test split?

There is a large variation among all of the accuracy measures between the training and test splits. Each measure of training is lower (more accurate) that test. This makes sense because more data was used in the training set.

hw3

Randall Thompson

9/12/2020