library(readxl, quietly = TRUE, warn.conflicts = FALSE, verbose = F)
library(fpp2,quietly = TRUE, warn.conflicts = FALSE, verbose = F)
library(ggplot2)
library(gridExtra)
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.
pre <- autoplot(usnetelec)
lambda <- BoxCox.lambda(usnetelec)
lambda
## [1] 0.5167714
post <- autoplot(BoxCox(elec,lambda))
grid.arrange(pre, post)
pre <- autoplot(usgdp)
lambda <- BoxCox.lambda(usgdp)
lambda
## [1] 0.366352
post <- autoplot(BoxCox(usgdp,lambda))
grid.arrange(pre, post)
pre <- autoplot(mcopper)
lambda <- BoxCox.lambda(mcopper)
lambda
## [1] 0.1919047
post <- autoplot(BoxCox(mcopper,lambda))
grid.arrange(pre, post)
pre <- autoplot(enplanements)
lambda <- BoxCox.lambda(enplanements)
post <- autoplot(BoxCox(enplanements,lambda))
grid.arrange(pre, post)
Why is a Box-Cox transformation unhelpful for the cangas data?
Based on the plot below, we see that variation doesn’t increases or decreases with the level of the series therefore, we don’t need any transformation.
pre <- autoplot(cangas)
lambda <- BoxCox.lambda(cangas)
lambda
## [1] 0.5767759
post <- autoplot(BoxCox(cangas,lambda))
grid.arrange(pre, post)
What Box-Cox transformation would you select for your retail data
Based on the plot below, a log transformation. Retail sales are likely to increase proportionally to population and log transformations are also easier to explain.
retailts <- readxl::read_excel("retail.xlsx", skip=1)
retailts <- ts(retailts[,"A3349873A"],frequency=12, start=c(1982,4))
autoplot(retailts)
(lambda <- BoxCox.lambda(retailts))
## [1] 0.1276369
autoplot(BoxCox(retailts,lambda))
myts <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(myts[,"A3349873A"],frequency=12, start=c(1982,4))
myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)
autoplot(myts) +
autolayer(myts.train, series="Training") +
autolayer(myts.test, series="Test")
fc <- snaive(myts.train)
summary(fc)
##
## Forecast method: Seasonal naive method
##
## Model Information:
## Call: snaive(y = myts.train)
##
## Residual sd: 18.7223
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 7.772973 20.24576 15.95676 4.702754 8.109777 1 0.738509
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2011 266.2 240.2540 292.1460 226.5190 305.8810
## Feb 2011 240.0 214.0540 265.9460 200.3190 279.6810
## Mar 2011 267.5 241.5540 293.4460 227.8190 307.1810
## Apr 2011 260.7 234.7540 286.6460 221.0190 300.3810
## May 2011 272.8 246.8540 298.7460 233.1190 312.4810
## Jun 2011 260.5 234.5540 286.4460 220.8190 300.1810
## Jul 2011 268.5 242.5540 294.4460 228.8190 308.1810
## Aug 2011 277.0 251.0540 302.9460 237.3190 316.6810
## Sep 2011 278.7 252.7540 304.6460 239.0190 318.3810
## Oct 2011 279.0 253.0540 304.9460 239.3190 318.6810
## Nov 2011 319.3 293.3540 345.2460 279.6190 358.9810
## Dec 2011 400.2 374.2540 426.1460 360.5190 439.8810
## Jan 2012 266.2 229.5068 302.8932 210.0826 322.3174
## Feb 2012 240.0 203.3068 276.6932 183.8826 296.1174
## Mar 2012 267.5 230.8068 304.1932 211.3826 323.6174
## Apr 2012 260.7 224.0068 297.3932 204.5826 316.8174
## May 2012 272.8 236.1068 309.4932 216.6826 328.9174
## Jun 2012 260.5 223.8068 297.1932 204.3826 316.6174
## Jul 2012 268.5 231.8068 305.1932 212.3826 324.6174
## Aug 2012 277.0 240.3068 313.6932 220.8826 333.1174
## Sep 2012 278.7 242.0068 315.3932 222.5826 334.8174
## Oct 2012 279.0 242.3068 315.6932 222.8826 335.1174
## Nov 2012 319.3 282.6068 355.9932 263.1826 375.4174
## Dec 2012 400.2 363.5068 436.8932 344.0826 456.3174
accuracy(fc,myts.test)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 7.772973 20.24576 15.95676 4.702754 8.109777 1.000000 0.7385090
## Test set 55.300000 71.44309 55.78333 14.900996 15.082019 3.495907 0.5315239
## Theil's U
## Training set NA
## Test set 1.297866
Based on the residual plot below, variance is not constant, we see few big negative residual values.
Based on the ACF plot, we see there is correlation. There is information left in the residuals which should be used in computing forecasts.
Residual histogram doesn’t look like a normal distribution. Looks like the residual do not have zero mean and the forecasts based on this might be biased.
checkresiduals(fc)
##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 624.45, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
The accuracy measures are always sensitive to the training/test split. A better approach would be to use cross validation.