Exercise 7.1: Consider the pigs series — the number of pigs slaughtered in Victoria each month.
pigsdata <- pigs
fc <- ses(pigsdata, h = 4)
summary(fc)
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = pigsdata, h = 4)
##
## Smoothing parameters:
## alpha = 0.2971
##
## Initial states:
## l = 77260.0561
##
## sigma: 10308.58
##
## AIC AICc BIC
## 4462.955 4463.086 4472.665
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 385.8721 10253.6 7961.383 -0.922652 9.274016 0.7966249 0.01282239
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Sep 1995 98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995 98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995 98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995 98816.41 83958.37 113674.4 76092.99 121539.8
autoplot(fc) +
autolayer(fitted(fc), series = 'Fitted') +
ylab("Number of Pigs Slaughtered in Victoria") +
xlab('Year')
The coefficients calculated by the ses
function are: \(\alpha = 0.2971\) and \(l = 77260.0561\)
The \(\sigma\) calculated above is 10308.58 and the final datapoint is our \(\hat{y} = 98816.41\), making our calculated 95% prediction interval \[\hat{y}± 1.95\sigma = 98816.41 ± 1.96 *10308.58 = \]
\[78611.59 - 119021.23\] which compares extremely favorably with the \(78611.97 - 119020.8\) provided by R.
Exercise 7.5: Data set books
contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.
bookdata<- books
plot(bookdata)
This dataset contains two timeseries: one for the sale of Paperback books and another for the sale of Hardcovers. Both sets contain data for 30 units of time (days). There appears to be an upward trend in both timeseries, with the Paperback one showing more daily fluctuations than the Hardcover data. The Paperback data shows drastic daily swings that are fairly consistent. The fluctuations appear to decrease over time, but there is too little data to be definitive on the subject.
ses()
function to forecast each series, and plot the forecasts.fc2 <- ses(bookdata[,1], h = 4)
summary(fc2)
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = bookdata[, 1], h = 4)
##
## Smoothing parameters:
## alpha = 0.1685
##
## Initial states:
## l = 170.8271
##
## sigma: 34.8183
##
## AIC AICc BIC
## 318.9747 319.8978 323.1783
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 7.175981 33.63769 27.8431 0.4736071 15.57784 0.7021303 -0.2117522
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 207.1097 162.4882 251.7311 138.8670 275.3523
## 32 207.1097 161.8589 252.3604 137.9046 276.3147
## 33 207.1097 161.2382 252.9811 136.9554 277.2639
## 34 207.1097 160.6259 253.5935 136.0188 278.2005
autoplot(fc2) +
autolayer(fitted(fc2), series = 'Fitted') +
ylab("Number of Paperback Books Sold") +
xlab('Day')
fc3 <- ses(bookdata[,2], h = 4)
summary(fc3)
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = bookdata[, 2], h = 4)
##
## Smoothing parameters:
## alpha = 0.3283
##
## Initial states:
## l = 149.2861
##
## sigma: 33.0517
##
## AIC AICc BIC
## 315.8506 316.7737 320.0542
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 9.166735 31.93101 26.77319 2.636189 13.39487 0.7987887 -0.1417763
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 239.5601 197.2026 281.9176 174.7799 304.3403
## 32 239.5601 194.9788 284.1414 171.3788 307.7414
## 33 239.5601 192.8607 286.2595 168.1396 310.9806
## 34 239.5601 190.8347 288.2855 165.0410 314.0792
autoplot(fc3) +
autolayer(fitted(fc3), series = 'Fitted') +
ylab("Number of Hardcover Books Sold") +
xlab('Day')
As provided within the summaries above, the RMSE value for the Paperback series is 33.63769 and 31.93101 for the Hardcover timeseries.
Exercise 7.6: We will continue with the daily sales of paperback and hardcover books in data set books
.
paperback
and hardback
series and compute four-day forecasts in each case.holt1 <- holt(bookdata[,1], h=4)
holt2 <- holt(bookdata[,2], h=4)
summary(holt1)
##
## Forecast method: Holt's method
##
## Model Information:
## Holt's method
##
## Call:
## holt(y = bookdata[, 1], h = 4)
##
## Smoothing parameters:
## alpha = 1e-04
## beta = 1e-04
##
## Initial states:
## l = 170.699
## b = 1.2621
##
## sigma: 33.4464
##
## AIC AICc BIC
## 318.3396 320.8396 325.3456
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -3.717178 31.13692 26.18083 -5.508526 15.58354 0.6602122
## ACF1
## Training set -0.1750792
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 209.4668 166.6035 252.3301 143.9130 275.0205
## 32 210.7177 167.8544 253.5811 145.1640 276.2715
## 33 211.9687 169.1054 254.8320 146.4149 277.5225
## 34 213.2197 170.3564 256.0830 147.6659 278.7735
summary(holt2)
##
## Forecast method: Holt's method
##
## Model Information:
## Holt's method
##
## Call:
## holt(y = bookdata[, 2], h = 4)
##
## Smoothing parameters:
## alpha = 1e-04
## beta = 1e-04
##
## Initial states:
## l = 147.7935
## b = 3.303
##
## sigma: 29.2106
##
## AIC AICc BIC
## 310.2148 312.7148 317.2208
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.1357882 27.19358 23.15557 -2.114792 12.1626 0.6908555
## ACF1
## Training set -0.03245186
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 250.1739 212.7390 287.6087 192.9222 307.4256
## 32 253.4765 216.0416 290.9113 196.2248 310.7282
## 33 256.7791 219.3442 294.2140 199.5274 314.0308
## 34 260.0817 222.6468 297.5166 202.8300 317.3334
autoplot(bookdata) +
autolayer(holt1, series = 'Holt Paperback forecast', PI=FALSE) +
autolayer(holt2, series = 'Holt Hardcover forecast', PI=FALSE)
e_pap_ses <- tsCV(books[,1], ses, h=4)
e_hard_ses <- tsCV(books[,2], ses, h=4)
e_pap_holt <- tsCV(books[,1], holt, h=4)
e_hard_holt <- tsCV(books[,2], holt, h=4)
#Calculate MSE
ses_pap_rmse <- sqrt(mean(e_pap_ses^2, na.rm = TRUE))
holt_pap_rmse <- sqrt(mean(e_pap_holt^2, na.rm = TRUE))
ses_hard_rmse <- sqrt(mean(e_hard_ses^2, na.rm = TRUE))
holt_hard_rmse <- sqrt(mean(e_hard_holt^2, na.rm = TRUE))
The RMSE for the Paperback ses
is 38.4091783 and 46.1478633 for the holt
, indicating the ses is a better forecast. Holt
assumes a clear trend which seem to be present it our data, but the magnitude of the “noise” which doesn’t appear to be cyclical is so high that it appears to reduce the accuracy of the Holt method. The degree to which te swings affect the trend almost make the data seem random, increasing the perceived accuracy of the ses method.
For the Harcover data, the corresponding numbers are 38.740656 and 39.428139. The numbers here are much closer, though the ses RMSE is still slightly lower. The trend in this data is more obviously positive, explaining the lower RMSE for the Holt forecast.
For the Paperback dataset I would likely stick to the ses
method since the trend is not as clear as one would like in such a small set of data.
For the Hardcover data I would probably go with the Holt
forecast since it accounts for the upward trend within the data and reflects those changes more strongly.
d.Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses
and holt
.
ses_pap_sig <- 34.8183
l1_predict_ses_pap <- 239.5601
min_ses_pap <- l1_predict_ses_pap - 1.96 * ses_pap_sig
max_ses_pap <- l1_predict_ses_pap + 1.96 * ses_pap_sig
holt_pap_sig <- 33.4464
l1_predict_holt_pap <- 209.4668
min_holt_pap <- l1_predict_holt_pap - 1.96 * holt_pap_sig
max_holt_pap <- l1_predict_holt_pap + 1.96 * holt_pap_sig
ses_hard_sig <- 33.0517
l1_predict_ses_hard <- 239.5601
min_ses_hard <- l1_predict_ses_hard - 1.96 * ses_hard_sig
max_ses_hard <- l1_predict_ses_hard + 1.96 * ses_hard_sig
holt_hard_sig <- 29.2106
l1_predict_holt_hard <- 250.1739
min_holt_hard <- l1_predict_holt_hard - 1.96 * holt_hard_sig
max_holt_hard <- l1_predict_holt_hard + 1.96 * holt_hard_sig
The four calculated 95% prediction intervals are:
\(171.316232 - 307.803968\)
\(143.911856 - 275.021744\)
\(174.778768 - 304.341432\)
\(192.921124 - 307.426676\)
We can clearly see that while the Holt predictions had higher RMSE values, the prediction intervals are somewhat narrower for both sets than that of the ses
prediction intervals.
For Paperbacks, the holt
prediction interval is lower than that of the ses prediction, shifted by about 30 sales. For Hardcovers, the average of the prediction is shifted up fo the
holtby about 10 sales, with both the lower and higher bounds of the interval higher than the lower and higher bounds of the
ses` interval respectively.
Exercise 7.7: For this exercise use data set eggs
, the price of a dozen eggs in the United States from 1900–1993. Experiment with the various options in the holt()
function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.
[Hint: use h=100
when calling holt()
so you can clearly see the differences between the various options when plotting the forecasts.]
Which model gives the best RMSE?
fc7_1 <- holt(eggs, h=100)
fc7_2 <- holt(eggs, damped = TRUE, phi = 0.98, h=100)
fc7_3 <- holt(eggs, damped = TRUE, phi = 0.975, h=100)
fc7_4 <- holt(eggs, damped = TRUE, phi = 0.97, h=100)
fc7_5 <- holt(eggs, damped = TRUE, phi = 0.95, h=100)
autoplot (eggs) +
autolayer(fc7_1, series = "Holt's Undamped", PI = FALSE) +
autolayer(fc7_2, series = "Damped Holt's phi = 0.98", PI = FALSE) +
autolayer(fc7_3, series = "Damped Holt's phi = 0.975", PI = FALSE) +
autolayer(fc7_4, series = "Damped Holt's phi = 0.97", PI = FALSE) +
autolayer(fc7_5, series = "Damped Holt's phi = 0.95", PI = FALSE) +
ggtitle("Effect of phi on Holt's Method Forecasts") +
xlab('Year')+
ylab('Price of Dozen Eggs in US ($)')+
guides(colour=guide_legend(title = 'Forecast'))
We can see here that applying a dampening factor \(\phi\) to the predictive model changes our projections from following the overall downward trend (impossible as it predicts the price of eggs to be negative at some point around 2015) as the undamped model does. This seems to be a great example where dampening the model is critical for the model to reflect reality. Adjusting the coefficient \(\phi\) influences how quickly the dampening effect takes place, with the highest value of 0.98 resulting in the slowest and most gradual dampening.
fc7_6 <- holt(eggs, damped = TRUE, phi = 0.98, alpha = 0.3, h=100)
fc7_7 <- holt(eggs, damped = TRUE, phi = 0.98, alpha = 0.4, h=100)
fc7_8 <- holt(eggs, damped = TRUE, phi = 0.98, alpha = 0.6, h=100)
fc7_9 <- holt(eggs, damped = TRUE, phi = 0.98, alpha = 0.8, h=100)
autoplot (eggs) +
autolayer(fc7_2, series = "Damped Holt's alpha = 0.2", PI = FALSE) +
autolayer(fc7_6, series = "Damped Holt's alpha = 0.3", PI = FALSE) +
autolayer(fc7_7, series = "Damped Holt's alpha = 0.4", PI = FALSE) +
autolayer(fc7_8, series = "Damped Holt's alpha = 0.6", PI = FALSE) +
autolayer(fc7_9, series = "Damped Holt's alpha = 0.7", PI = FALSE) +
ggtitle("Effect of alpha on Holt's Method Forecasts (phi = 0.98)") +
xlab('Year')+
ylab('Price of Dozen Eggs in US ($)')+
guides(colour=guide_legend(title = 'Forecast'))+
ylim(25,80)
Adjusting the \(\alpha\) coefficient impacts how much weight is placed on the closest datapoints for calculating the prediction. Reducing this coefficient will slightly increase the weight of relatively “older” datapoints. We can see from the graph above that the starting point of our prediction largely depends on the alpha coefficient, with the lower coefficients corresponding to higher values for the initial prediction - this holds true due to the overall trend of the data being downward, meaning older datapoints are more likely to be higher values. This is not, however, a linear shift of the entire projection - we can clearly see that the prediction with the highest \(\alpha = 0.7\) starts the at the lowest value, but the rest of the prediction ends higher than many of the others.
Exercise 7.8: Recall your retail time series data (from Exercise 3 in Section 2.10).
#Data available at https://github.com/mkollontai/DATA624
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349874C"],
frequency=12, start=c(1982,4))
autoplot(myts)
In the graph above it is evident that the seasonal component of the data increases as time moves forward. This can be broken out with the additive method, but it would result in changing magnitudes of the seasonal component. By applying the multiplicative method we can ensure that the seasonal component of our decomposition has a magnitude that is closer to uniform - something easier to work with in future manipulations/analyses.
fc8_1 <- hw(myts, seasonal='multiplicative', h=50)
fc8_2 <- hw(myts, seasonal='multiplicative', damped = TRUE, h=50)
autoplot (myts) +
autolayer(fc8_1, series = "Holt-Winters Undamped") +
autolayer(fc8_2, series = "Damped Holt-Winters") +
ggtitle("Holt-Winters Method Forecasts of Retail Data") +
xlab('Year')+
ylab('Retail Data')+
guides(colour=guide_legend(title = 'Forecast'))
In this instance, the dampening appears to flatten out the trend of the prediction nearly completely.
accuracy(fc8_1)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.2547906 8.354074 5.050794 -0.5269882 4.330325 0.533214
## ACF1
## Training set 0.1163112
accuracy(fc8_2)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.6737245 8.391557 5.007798 0.4181949 4.182986 0.5286749 0.1004214
The undamped prediction appears to have a slightly lower RMSE. It’s difficult to select one as better, but to me the damped version seems better as the data does appear to be leveling off as time progresses.
autoplot(residuals(fc8_2))
The residuals of the damped method do indeed look like white noise, though there are a few noticeable spikes around 2000 and 2010.
myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)
fc_hwd <- hw(myts.train, seasonal='multiplicative', damped = TRUE)
fc_hw <- hw(myts.train, seasonal='multiplicative')
fc_n <- snaive(myts.train)
accuracy(fc_n,myts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 6.122823 12.45603 8.666967 5.906047 7.848706 1.000000
## Test set -31.379167 34.36303 31.379167 -18.882634 18.882634 3.620548
## ACF1 Theil's U
## Training set 0.5367142 NA
## Test set 0.1146989 0.8221791
accuracy(fc_hw,myts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.1138424 7.899346 4.50940 -0.5483582 4.082228 0.5202973
## Test set -33.3031918 35.182170 33.30319 -20.1724710 20.172471 3.8425428
## ACF1 Theil's U
## Training set 0.1476838 NA
## Test set 0.3629014 0.8667639
accuracy(fc_hwd,myts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.5156628 7.575832 4.604223 0.3071277 4.175587 0.5312381
## Test set -32.3466216 34.439038 33.113598 -20.0581682 20.295549 3.8206674
## ACF1 Theil's U
## Training set 0.0879495 NA
## Test set 0.1427832 0.855879
The RMSE of the test set for the naive forecast is ever so slightly lower than that of both of the Holt-Winters methods and the damped method beats out the non-damped method (in terms of RMSE).
Exercise 7.9: For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?
autoplot(myts)
myts_la <- BoxCox.lambda(myts)
stl <- stlf(myts.train, lambda = myts_la)
ets <- ets(seasadj(decompose(myts.train, 'multiplicative')))
autoplot(myts.train) +
autolayer(forecast(stl, h = 24), series = 'STL') +
autolayer(forecast(ets, h = 24, PI = FALSE), series = 'ETS') +
autolayer(myts.test, series = 'True Data')
accuracy(stl,myts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.0609065 6.907577 3.940829 -0.1113073 3.42420 0.4546953
## Test set -43.5154392 45.809387 43.515439 -26.3661826 26.36618 5.0208382
## ACF1 Theil's U
## Training set 0.02776491 NA
## Test set 0.46472471 1.134057
ets$mse
## [1] 47.25964
We can see that the RMSE for the STL method is slightly lower than that of the ETS method, but the Holt-Winters Damped method had the lowest RMSE of them all (other than the naive).