#install.packages("GGally")
library(GGally)
## Loading required package: ggplot2
library(fpp2)
## Loading required package: forecast
## Loading required package: fma
##
## Attaching package: 'fma'
## The following object is masked from 'package:GGally':
##
## pigs
## Loading required package: expsmooth
library(readxl)
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. usnetelec usgdp mcopper enplanements
Solution We want a good value of ?? is one which makes the size of the seasonal variation about the same across the whole series, as that makes the forecasting model simpler.
usnetelec
(lambda <- BoxCox.lambda(usnetelec))
## [1] 0.5167714
autoplot(BoxCox(usnetelec,lambda))
autoplot(usnetelec)
usgdp
(lambda <- BoxCox.lambda(usgdp))
## [1] 0.366352
autoplot(usgdp)
autoplot(BoxCox(usgdp,lambda))
mcopper
(lambda <- BoxCox.lambda(mcopper))
## [1] 0.1919047
autoplot(mcopper)
autoplot(BoxCox(mcopper,lambda))
enplanements
(lambda <- BoxCox.lambda(enplanements))
## [1] -0.2269461
autoplot(BoxCox(enplanements,lambda))
autoplot(enplanements)
Why is a Box-Cox transformation unhelpful for the cangas data?
(lambda <- BoxCox.lambda(cangas))
## [1] 0.5767759
autoplot(BoxCox(cangas,lambda))
autoplot(cangas)
The box cox is unhelpful because the timeseries has too much non constant variance especially at the end.
What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?
retaildata <- readxl::read_excel("C:/Users/Mezu/Documents/retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349873A"],
frequency=12, start=c(1982,4))
autoplot(myts)
(lambda <- BoxCox.lambda(myts))
## [1] 0.1276369
autoplot(BoxCox(myts,lambda))
I will use lambda =0.127 This helpsto even out the variance
For your retail time series (from Exercise 3 in Section 2.10): Split the data into two parts using
myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)
Check that your data have been split appropriately by producing the following plot.
autoplot(myts) +
autolayer(myts.train, series="Training") +
autolayer(myts.test, series="Test")
Calculate forecasts using snaive applied to myts.train.
fc <- snaive(myts.train)
Compare the accuracy of your forecasts against the actual values stored in myts.test
accuracy(fc,myts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 7.772973 20.24576 15.95676 4.702754 8.109777 1.000000
## Test set 55.300000 71.44309 55.78333 14.900996 15.082019 3.495907
## ACF1 Theil's U
## Training set 0.7385090 NA
## Test set 0.5315239 1.297866
Check the residuals.
checkresiduals(fc)
##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 624.45, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Do the residuals appear to be uncorrelated and normally distributed?
From the plots the residuals appears to be normally distributed
How sensitive are the accuracy measures to the training/test split?
The time plot of the residuals shows that the variation of the residuals has a pattern and it is not constant. We can also say the same for the ACFresidual lag plot. However, the distribution of the residuals is normall. This means that there is likely some bias since residual plot is not close to zero. The predictions might not be good but the prediction intervals assuming normally distribution should be good.