library(fpp2)
library(tidyverse)
BoxCox.lambda(usnetelec)
## [1] 0.5167714
BoxCox.lambda(usgdp)
## [1] 0.366352
BoxCox.lambda(mcopper)
## [1] 0.1919047
BoxCox.lambda(enplanements)
## [1] -0.2269461
l <- BoxCox.lambda(cangas)
cdf <- cbind(cangas, BoxCox(cangas, l))
autoplot(cdf, facet = TRUE)
The Box-Cox transformation is helpful for the cangas data because it helps make the size of the seasonal variation about the same across the whole series. This can make the forecasting model simpler and more accurate.
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349873A"], frequency=12, start=c(1982,4))
BoxCox.lambda(myts)
## [1] 0.1276369
myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)
autoplot(myts) +
autolayer(myts.train, series="Training") +
autolayer(myts.test, series="Test")
fc <- snaive(myts.train)
accuracy(fc,myts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 7.772973 20.24576 15.95676 4.702754 8.109777 1.000000
## Test set 55.300000 71.44309 55.78333 14.900996 15.082019 3.495907
## ACF1 Theil's U
## Training set 0.7385090 NA
## Test set 0.5315239 1.297866
checkresiduals(fc)
##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 624.45, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Do the residuals appear to be uncorrelated and normally distributed?
The residuals appear to be correlated in the ACF. The small lags are large and positive and slowly decrease as lags increase. The observation is consistant with trending data. The Q* value is large and the p-value has reached significants indicating that we must reject that null hypothese and find that the residuals can be distiguished from white noise.
How sensitive are the accuracy measures to the training/test split?
There is a large variation among all of the accuracy measures between the training and test splits. Each measure of training is lower (more accurate) that test. This makes sense because more data was used in the training set.