Excercise 3.1

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. usnetelec usgdp mcopper enplanements

lambda<-BoxCox.lambda(usnetelec)
autoplot(BoxCox(usnetelec, lambda))

lambda
## [1] 0.5167714
lambda<-BoxCox.lambda(usgdp)
autoplot(BoxCox(usgdp, lambda))

lambda
## [1] 0.366352
lambda<-BoxCox.lambda(mcopper)
autoplot(BoxCox(mcopper, lambda))

lambda
## [1] 0.1919047
lambda<-BoxCox.lambda(enplanements)
autoplot(BoxCox(enplanements, lambda))

lambda
## [1] -0.2269461

Excercise 3.2

Why is a Box-Cox transformation unhelpful for the cangas data?

A good forecast should have residuals that are uncorrelated, have a zero mean, have constant variance and be normally distributed. The residuals in the cangas data appear to be are slightly right skewed in the histogram. The ACS plot shows some correlation there are rise and falls every 12 months. The autoplot appears to have a pattern where residuals are growingevery year.

res<-residuals(naive(cangas))
#autoplot(res)+xlab("Day")+ylab("")+
#  ggtitle("Residuals form naive method")
#gghistogram(res) +ggtitle("Histogram of residuals")
#ggAcf(res) + ggtitle("ACF of residuals")
checkresiduals(naive(res))

## 
##  Ljung-Box test
## 
## data:  Residuals from Naive method
## Q* = 1846.3, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24
Box.test(res, lag=10, fitdf=0) #Box-Pierce test
## 
##  Box-Pierce test
## 
## data:  res
## X-squared = 206.77, df = 10, p-value < 2.2e-16
Box.test(res, lag=10, fitdf=0, type="Lj") #Box-Ljung test
## 
##  Box-Ljung test
## 
## data:  res
## X-squared = 209.79, df = 10, p-value < 2.2e-16

Excercise 3.3

What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?

retaildata <- readxl::read_excel("retail.xlsx", skip=1)
retaildata_ts<-ts(retaildata[,"A3349397X"],frequency=12, start=c(1982,12), end=c(2013,12))
l<-BoxCox.lambda(retaildata_ts)

Based on plot graphs and accuracy stats Seasonal Naive is the best method for the retail data and the best tranformation lambda is 0.455. The Seasonal Naive method follows the historical trends much closer than the Mean and Naive methods. The acurracy error measures of RMSE, MAE, MAPE and MASE also confirm with lower values compared to the other two methods.

retail1<-meanf(retaildata_ts,h=10,lambda=l)
retail2<-rwf(retaildata_ts,h=10, lambda=l)
retail3<-snaive(retaildata_ts,h=10, lambda=l)
autoplot(window(retaildata_ts))+
  autolayer(retail1, series = "Mean", PI=FALSE) +
  autolayer(retail2, series = "Naive", PI=FALSE) +
  autolayer(retail3, series = "Seasonal naive", PI=FALSE) +
  xlab("Year") + ylab("Retail Sales") +
  ggtitle("Forecasts for quarterly retail sales") +
  guides(colur=guide_legend(titel="Forecast"))

par(mfrow = c(1,3))
autoplot(naive(retaildata_ts))

autoplot(meanf(retaildata_ts))

autoplot(snaive(retaildata_ts))

retaildata_win<-ts(retaildata[,"A3349397X"],start=1982)
accuracy(retail1, retaildata_win)
##                      ME     RMSE      MAE       MPE     MAPE     MASE      ACF1
## Training set   36.11161 292.5127 241.6035 -18.31754 46.29738 3.687407 0.9284374
## Test set     -147.71278 147.7128 147.7128 -32.95689 32.95689 2.254426        NA
accuracy(retail2, retaildata_win)
##                       ME     RMSE       MAE          MPE      MAPE     MASE
## Training set    2.046505 106.3225  65.52124   -0.9436343  10.78083 1.000000
## Test set     -524.400000 524.4000 524.40000 -117.0013387 117.00134 8.003512
##                    ACF1
## Training set -0.2745813
## Test set             NA
accuracy(retail3, retaildata_win)
##                      ME      RMSE       MAE         MPE       MAPE      MASE
## Training set   28.75651  49.60846  39.23795    4.739189   6.698247 0.5988585
## Test set     -594.40000 594.40000 594.40000 -132.619366 132.619366 9.0718679
##                   ACF1
## Training set 0.6530419
## Test set            NA

Excercise 3.8

For your retail time series (from Exercise 3 in Section 2.10):

A

Split the data into two parts using

retaildata.train <- window(retaildata_ts, end=c(2010,12))
retaildata.test <- window(retaildata_ts, start=2011)

B

Check that your data have been split appropriately by producing the following plot.

autoplot(retaildata_ts) +
  autolayer(retaildata.train, series="Training") +
  autolayer(retaildata.test, series="Test")

C

Calculate forecasts using snaive applied to myts.train.

fc <- snaive(retaildata.train)
fc
##          Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## Jan 2011         1029.9  964.4284 1095.372  929.7698 1130.030
## Feb 2011         1109.9 1044.4284 1175.372 1009.7698 1210.030
## Mar 2011         1032.0  966.5284 1097.472  931.8698 1132.130
## Apr 2011         1047.7  982.2284 1113.172  947.5698 1147.830
## May 2011         1015.7  950.2284 1081.172  915.5698 1115.830
## Jun 2011         1077.2 1011.7284 1142.672  977.0698 1177.330
## Jul 2011         1124.7 1059.2284 1190.172 1024.5698 1224.830
## Aug 2011         1409.4 1343.9284 1474.872 1309.2698 1509.530
## Sep 2011         1113.0 1047.5284 1178.472 1012.8698 1213.130
## Oct 2011          938.3  872.8284 1003.772  838.1698 1038.430
## Nov 2011         1028.6  963.1284 1094.072  928.4698 1128.730
## Dec 2011          994.3  928.8284 1059.772  894.1698 1094.430
## Jan 2012         1029.9  937.3092 1122.491  888.2945 1171.505
## Feb 2012         1109.9 1017.3092 1202.491  968.2945 1251.505
## Mar 2012         1032.0  939.4092 1124.591  890.3945 1173.605
## Apr 2012         1047.7  955.1092 1140.291  906.0945 1189.305
## May 2012         1015.7  923.1092 1108.291  874.0945 1157.305
## Jun 2012         1077.2  984.6092 1169.791  935.5945 1218.805
## Jul 2012         1124.7 1032.1092 1217.291  983.0945 1266.305
## Aug 2012         1409.4 1316.8092 1501.991 1267.7945 1551.005
## Sep 2012         1113.0 1020.4092 1205.591  971.3945 1254.605
## Oct 2012          938.3  845.7092 1030.891  796.6945 1079.905
## Nov 2012         1028.6  936.0092 1121.191  886.9945 1170.205
## Dec 2012          994.3  901.7092 1086.891  852.6945 1135.905

D

Compare the accuracy of your forecasts against the actual values stored in myts.test.

accuracy(fc,retaildata.test)
##                    ME     RMSE      MAE      MPE     MAPE     MASE      ACF1
## Training set 30.91662 51.08777 40.58123 5.178970 7.165508 1.000000 0.6442078
## Test set     23.43333 48.86515 42.83333 1.879971 3.798222 1.055496 0.5135711
##              Theil's U
## Training set        NA
## Test set     0.3566344

E

Check the residuals. Do the residuals appear to be uncorrelated and normally distributed?

The residuals appear to be uncorrlated and are normally distributed although there is a right skewed outlier. In addition the mean is shifted to the right of zero possible due to the outlier.

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 577.45, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

F

How sensitive are the accuracy measures to the training/test split

The acurracy measures appear to be fairly similar. A mase of 1 for the training set is less than 1.05 for test set is lower than the test set so is preferred. A mape of ~7% says the model is about 93% accurate.

accuracy(fc,retaildata.test)
##                    ME     RMSE      MAE      MPE     MAPE     MASE      ACF1
## Training set 30.91662 51.08777 40.58123 5.178970 7.165508 1.000000 0.6442078
## Test set     23.43333 48.86515 42.83333 1.879971 3.798222 1.055496 0.5135711
##              Theil's U
## Training set        NA
## Test set     0.3566344