For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. usnetelec usgdp mcopper enplanements
lambda<-BoxCox.lambda(usnetelec)
autoplot(BoxCox(usnetelec, lambda))
lambda
## [1] 0.5167714
lambda<-BoxCox.lambda(usgdp)
autoplot(BoxCox(usgdp, lambda))
lambda
## [1] 0.366352
lambda<-BoxCox.lambda(mcopper)
autoplot(BoxCox(mcopper, lambda))
lambda
## [1] 0.1919047
lambda<-BoxCox.lambda(enplanements)
autoplot(BoxCox(enplanements, lambda))
lambda
## [1] -0.2269461
Why is a Box-Cox transformation unhelpful for the cangas data?
A good forecast should have residuals that are uncorrelated, have a zero mean, have constant variance and be normally distributed. The residuals in the cangas data appear to be are slightly right skewed in the histogram. The ACS plot shows some correlation there are rise and falls every 12 months. The autoplot appears to have a pattern where residuals are growingevery year.
res<-residuals(naive(cangas))
#autoplot(res)+xlab("Day")+ylab("")+
# ggtitle("Residuals form naive method")
#gghistogram(res) +ggtitle("Histogram of residuals")
#ggAcf(res) + ggtitle("ACF of residuals")
checkresiduals(naive(res))
##
## Ljung-Box test
##
## data: Residuals from Naive method
## Q* = 1846.3, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Box.test(res, lag=10, fitdf=0) #Box-Pierce test
##
## Box-Pierce test
##
## data: res
## X-squared = 206.77, df = 10, p-value < 2.2e-16
Box.test(res, lag=10, fitdf=0, type="Lj") #Box-Ljung test
##
## Box-Ljung test
##
## data: res
## X-squared = 209.79, df = 10, p-value < 2.2e-16
What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
retaildata_ts<-ts(retaildata[,"A3349397X"],frequency=12, start=c(1982,12), end=c(2013,12))
l<-BoxCox.lambda(retaildata_ts)
Based on plot graphs and accuracy stats Seasonal Naive is the best method for the retail data and the best tranformation lambda is 0.455. The Seasonal Naive method follows the historical trends much closer than the Mean and Naive methods. The acurracy error measures of RMSE, MAE, MAPE and MASE also confirm with lower values compared to the other two methods.
retail1<-meanf(retaildata_ts,h=10,lambda=l)
retail2<-rwf(retaildata_ts,h=10, lambda=l)
retail3<-snaive(retaildata_ts,h=10, lambda=l)
autoplot(window(retaildata_ts))+
autolayer(retail1, series = "Mean", PI=FALSE) +
autolayer(retail2, series = "Naive", PI=FALSE) +
autolayer(retail3, series = "Seasonal naive", PI=FALSE) +
xlab("Year") + ylab("Retail Sales") +
ggtitle("Forecasts for quarterly retail sales") +
guides(colur=guide_legend(titel="Forecast"))
par(mfrow = c(1,3))
autoplot(naive(retaildata_ts))
autoplot(meanf(retaildata_ts))
autoplot(snaive(retaildata_ts))
retaildata_win<-ts(retaildata[,"A3349397X"],start=1982)
accuracy(retail1, retaildata_win)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 36.11161 292.5127 241.6035 -18.31754 46.29738 3.687407 0.9284374
## Test set -147.71278 147.7128 147.7128 -32.95689 32.95689 2.254426 NA
accuracy(retail2, retaildata_win)
## ME RMSE MAE MPE MAPE MASE
## Training set 2.046505 106.3225 65.52124 -0.9436343 10.78083 1.000000
## Test set -524.400000 524.4000 524.40000 -117.0013387 117.00134 8.003512
## ACF1
## Training set -0.2745813
## Test set NA
accuracy(retail3, retaildata_win)
## ME RMSE MAE MPE MAPE MASE
## Training set 28.75651 49.60846 39.23795 4.739189 6.698247 0.5988585
## Test set -594.40000 594.40000 594.40000 -132.619366 132.619366 9.0718679
## ACF1
## Training set 0.6530419
## Test set NA
For your retail time series (from Exercise 3 in Section 2.10):
Split the data into two parts using
retaildata.train <- window(retaildata_ts, end=c(2010,12))
retaildata.test <- window(retaildata_ts, start=2011)
Check that your data have been split appropriately by producing the following plot.
autoplot(retaildata_ts) +
autolayer(retaildata.train, series="Training") +
autolayer(retaildata.test, series="Test")
Calculate forecasts using snaive applied to myts.train.
fc <- snaive(retaildata.train)
fc
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2011 1029.9 964.4284 1095.372 929.7698 1130.030
## Feb 2011 1109.9 1044.4284 1175.372 1009.7698 1210.030
## Mar 2011 1032.0 966.5284 1097.472 931.8698 1132.130
## Apr 2011 1047.7 982.2284 1113.172 947.5698 1147.830
## May 2011 1015.7 950.2284 1081.172 915.5698 1115.830
## Jun 2011 1077.2 1011.7284 1142.672 977.0698 1177.330
## Jul 2011 1124.7 1059.2284 1190.172 1024.5698 1224.830
## Aug 2011 1409.4 1343.9284 1474.872 1309.2698 1509.530
## Sep 2011 1113.0 1047.5284 1178.472 1012.8698 1213.130
## Oct 2011 938.3 872.8284 1003.772 838.1698 1038.430
## Nov 2011 1028.6 963.1284 1094.072 928.4698 1128.730
## Dec 2011 994.3 928.8284 1059.772 894.1698 1094.430
## Jan 2012 1029.9 937.3092 1122.491 888.2945 1171.505
## Feb 2012 1109.9 1017.3092 1202.491 968.2945 1251.505
## Mar 2012 1032.0 939.4092 1124.591 890.3945 1173.605
## Apr 2012 1047.7 955.1092 1140.291 906.0945 1189.305
## May 2012 1015.7 923.1092 1108.291 874.0945 1157.305
## Jun 2012 1077.2 984.6092 1169.791 935.5945 1218.805
## Jul 2012 1124.7 1032.1092 1217.291 983.0945 1266.305
## Aug 2012 1409.4 1316.8092 1501.991 1267.7945 1551.005
## Sep 2012 1113.0 1020.4092 1205.591 971.3945 1254.605
## Oct 2012 938.3 845.7092 1030.891 796.6945 1079.905
## Nov 2012 1028.6 936.0092 1121.191 886.9945 1170.205
## Dec 2012 994.3 901.7092 1086.891 852.6945 1135.905
Compare the accuracy of your forecasts against the actual values stored in myts.test.
accuracy(fc,retaildata.test)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 30.91662 51.08777 40.58123 5.178970 7.165508 1.000000 0.6442078
## Test set 23.43333 48.86515 42.83333 1.879971 3.798222 1.055496 0.5135711
## Theil's U
## Training set NA
## Test set 0.3566344
Check the residuals. Do the residuals appear to be uncorrelated and normally distributed?
The residuals appear to be uncorrlated and are normally distributed although there is a right skewed outlier. In addition the mean is shifted to the right of zero possible due to the outlier.
checkresiduals(fc)
##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 577.45, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
How sensitive are the accuracy measures to the training/test split
The acurracy measures appear to be fairly similar. A mase of 1 for the training set is less than 1.05 for test set is lower than the test set so is preferred. A mape of ~7% says the model is about 93% accurate.
accuracy(fc,retaildata.test)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 30.91662 51.08777 40.58123 5.178970 7.165508 1.000000 0.6442078
## Test set 23.43333 48.86515 42.83333 1.879971 3.798222 1.055496 0.5135711
## Theil's U
## Training set NA
## Test set 0.3566344