Forecasting
## [1] 0.5167714
adf <- cbind(Raw = a, Transformed = BoxCox(a, lambdaa))
autoplot(adf, facet=TRUE) +
labs(title = "Monthly Copper Prices",
x = "Year", y = "")## [1] 0.366352
bdf <- cbind(Raw = b, Transformed = BoxCox(b, lambdab))
autoplot(bdf, facet=TRUE) +
labs(title = "US GDP",
x = "Year", y = "")The Box-Cox transformation is unhelpful because the variation does not increase or decrease however instead follows a complex pattern. The variation increases slowly, then increases rapidly, and decreases slowly towards the end. The transformed time series shows a pattern with a non-constant variance.
## [1] 0.5767759
twodf <- cbind(Raw = two, Transformed = BoxCox(two, lambdatwo))
autoplot(twodf, facet=TRUE) +
labs(title = "Monthly Canadian Gas Production",
x = "Year", y = "")For retail data, I choose the 7th variable column in the retail dataset which is close to 1.
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
colID <- colnames(retaildata)[8]
myts <- ts(retaildata[ , colID], frequency=12, start=c(1982,4))
lambdathree <- BoxCox.lambda(myts)
lambdathree## [1] 0.9165544
threedf <- cbind(Raw = myts, Transformed = BoxCox(myts, lambdathree))
autoplot(threedf, facet=TRUE) +
labs(title = "Turnover - NSW",
x = "Month", y = "")autoplot(myts) +
autolayer(myts.train, series="Training") +
autolayer(myts.test, series="Test") +
geom_vline(xintercept = 2011, lty = 2) +
labs(title = "Turnover - NSW",
x = "Month", y = "") +
guides(colour=guide_legend(title="Series"))## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 9.460661 26.30758 21.23363 4.655690 12.762886 1.0000000 0.8070166
## Test set 17.212500 21.26067 17.39583 4.748234 4.807728 0.8192584 0.4843871
## Theil's U
## Training set NA
## Test set 0.6934111
##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 856.11, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
The residuals are strongly correlated and appear normally distributed however the distribution shown in the histogram appears slightly skewed. The left tail which indicates that the forecasts are biased.
The accuracy measures are quite sensitive to the training/test split. The values are significantly different between the two. This would suggest that the model does not generalize well.