For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.
usnetelec
usgdp
mcopper
enplanements
## [1] 0.5167714
original <- autoplot(usnetelec) + ggtitle("Original")
transformed <- autoplot(BoxCox(usnetelec,lambda)) + ggtitle("Box-Cox Transformation")
grid.arrange(original, transformed, nrow = 1)
## [1] 0.366352
original <- autoplot(usgdp) + ggtitle("Original")
transformed <- autoplot(BoxCox(usgdp,lambda)) + ggtitle("Box-Cox Transformation")
grid.arrange(original, transformed, nrow = 1)
Why is a Box-Cox transformation unhelpful for the cangas
data?
## [1] 0.5767759
original <- autoplot(cangas) + ggtitle("Original")
transformed <- autoplot(BoxCox(cangas,lambda)) + ggtitle("Box-Cox Transformation")
grid.arrange(original, transformed, nrow = 1)
Even after the transformation with the best \(\lambda = 0.5767759\) chosen, the variance is not constant or did not really change. In other words, the seasonal variation is not the same across the series which makes it harder to exlain.
What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349399C"], frequency=12, start=c(1982,4)) #clothing
## [1] 0.02074707
With \(\lambda\) at 0.02074707
, the variance is constant which makes the forecasting model simpler.
For your retail time series (from Exercise 3 in Section 2.10):
snaive
applied to myts.train
## ME RMSE MAE MPE MAPE MASE
## Training set 9.007207 21.13832 16.58859 4.224080 7.494415 1.000000
## Test set 10.362500 21.50499 18.99583 2.771495 5.493632 1.145115
## ACF1 Theil's U
## Training set 0.5277855 NA
## Test set 0.7420700 0.3223094
##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 342, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Do the residuals appear to be uncorrelated and normally distributed?
The residuals appear to be correlated.
Reason: Not a white series.
According to the text, large values of
\(Q^*\) or the Ljung-Box test suggest that the autocorrelations do not come from a white noise series which in our case is true since the \(Q^*\) is large with value 342
.
For the \(Q^*\), the result is significant therefore the residuals are from a white series.
The regression above is therefore nugatory although it can be improved. There are lots of areas that need to be explored.
The residuals are not normally distributed. The residuals are not centered around 0 as there is a longer left tail.
fit1 <- meanf(myts.train, h=24)
fit2 <- rwf(myts.train,h=24)
fit3 <- snaive(myts.train,h=24)
autoplot(myts) +
autolayer(fit1, series="Mean", PI=FALSE) +
autolayer(fit2, series="Naïve", PI=FALSE) +
autolayer(fit3, series="Seasonal naïve", PI=FALSE) +
xlab("Year") + ylab("Turnover") +
ggtitle("Forecasts for Retail Turnover on Clothing") +
guides(colour=guide_legend(title="Forecast"))
## ME RMSE MAE MPE MAPE MASE
## Training set -3.688274e-15 84.8532 65.83623 -16.48880 36.09754 3.968766
## Test set 1.361463e+02 152.8584 136.14634 36.57186 36.57186 8.207229
## ACF1 Theil's U
## Training set 0.7766742 NA
## Test set 0.1762124 2.203431
## ME RMSE MAE MPE MAPE MASE
## Training set 1.29593 53.66712 34.48081 -1.712046 15.51772 2.078586
## Test set -185.11250 197.72838 189.03750 -56.668479 57.35239 11.395635
## ACF1 Theil's U
## Training set -0.2404895 NA
## Test set 0.1762124 2.992736
## ME RMSE MAE MPE MAPE MASE
## Training set 9.007207 21.13832 16.58859 4.224080 7.494415 1.000000
## Test set 10.362500 21.50499 18.99583 2.771495 5.493632 1.145115
## ACF1 Theil's U
## Training set 0.5277855 NA
## Test set 0.7420700 0.3223094
Accuaracy measures are always sensitive to training/test splits. The seasonal naive method is the best of these three methods for this dataset.