For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.
autoplot(usnetelec)
(lambda <- BoxCox.lambda(usnetelec))
## [1] 0.5167714
autoplot(BoxCox(usnetelec,lambda))
autoplot(usgdp)
(lambda <- BoxCox.lambda(usgdp))
## [1] 0.366352
autoplot(BoxCox(usgdp,lambda))
autoplot(mcopper)
(lambda <- BoxCox.lambda(mcopper))
## [1] 0.1919047
autoplot(BoxCox(mcopper,lambda))
autoplot(enplanements)
(lambda <- BoxCox.lambda(enplanements))
## [1] -0.2269461
autoplot(BoxCox(enplanements,lambda))
Why is a Box-Cox transformation unhelpful for the cangas data?
autoplot(cangas)
(lambda <- BoxCox.lambda(cangas))
## [1] 0.5767759
autoplot(BoxCox(cangas,lambda))
The autoplot before and after the transformation are the same. According to section 3.2 in our textbook: “If the data show variation that increases or decreases with the level of the series, then a transformation can be useful.” Based on the plot - the cangas dataset show variation that increases and then decreases in the more recent years.
What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?
#setwd("/Users/elinaazrilyan/Documents/Data624/")
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349882C"],
frequency=12, start=c(1982,4))
autoplot(myts)
(lambda <- BoxCox.lambda(myts))
## [1] 0.2324297
autoplot(BoxCox(myts,lambda))
For your retail time series (from Exercise 3 in Section 2.10):
myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)
autoplot(myts) +
autolayer(myts.train, series="Training") +
autolayer(myts.test, series="Test")
fc <- snaive(myts.train)
accuracy(fc,myts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 46.37387 59.27744 47.03213 7.755343 7.848159 1.000000
## Test set 67.42917 76.67352 67.42917 4.447568 4.447568 1.433683
## ACF1 Theil's U
## Training set 0.8279637 NA
## Test set 0.6327744 1.004278
checkresiduals(fc)
##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 2099.4, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
The mean of the residuals is not close to zero. ACF of the residuals from the seasonal naïve method shows a correlation suggesting our forecast could be improved. The histogram suggests that the residuals are not normal — the distribution is right-skewed. Also, hte Ljung-Box test shows a low p-value indicating that the residuals are significantly different from a white noise series.
The accuracy measures will be very sensitive to the training/test split here since there is an increase in variation of our data as obvious from the plot below. We would be able to make a better model with the higher % of the data in the train set but we won’t be albe to evaluate and use a model which has a very small test set since our test set needs to be at least as large as the maximum forecast horizon required. A time series cross-validation method would yield better results.
autoplot(myts) +
autolayer(myts.train, series="Training") +
autolayer(myts.test, series="Test")