7.1) Consider the pigs series - the number of pigs slaughtered in Victoria each month. a) Use the ses() function in R to find the optimal values of \(\alpha\)Â and lo, and generate forecasts for the next four months.
library(fpp2)
library(ggplot2)
fc_pigs <- ses(pigs, h=4)
fc_pigs$model
## Simple exponential smoothing
##
## Call:
## ses(y = pigs, h = 4)
##
## Smoothing parameters:
## alpha = 0.2971
##
## Initial states:
## l = 77260.0561
##
## sigma: 10308.58
##
## AIC AICc BIC
## 4462.955 4463.086 4472.665
fc_pigs
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Sep 1995 98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995 98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995 98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995 98816.41 83958.37 113674.4 76092.99 121539.8
\(\alpha\)Â = 0.2971 lo = 77260
yhigh = 98816.41 + 1.96*(sd(fc_pigs$residuals))
ylow = 98816.41 - 1.96*(sd(fc_pigs$residuals))
ylow
## [1] 78679.97
yhigh
## [1] 118952.8
I calculated the 95% confidence interval to be between 78679.97-118,952.8. R calculated the 95% confidence interval to be between 78611.97-119,020.8.
These are close, but not exactly the same. I expect the difference is due to the rounding of the point forecast and the rounding of 1.96.
7.5) Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.
autoplot(books) + ylab("Number of Books Sales at the Same Store") + xlab("Day") + ggtitle("Daily Book Sales")
The graph displays the daily sales of paperback books and hardcover books. The sales of both paperback books and hardcover books appears to have an upward trend over the course of the month. There is a cyclic pattern in the sales of paperback and hardcover books, but I can’t identify a set seasonality for each. The peaks and troughs in paperback and hardcover sales do not consistently correspond to each other.
pbBooks = books[,1]
hcBooks = books[,2]
ses_paper_books <- ses(pbBooks, h=4)
ses_hc_books <- ses(hcBooks, h=4)
autoplot(ses_paper_books) +
autolayer(fitted(ses_paper_books), series="SES Model") +
ylab("Number of Paperback Books Sold at the Same Store") + xlab("Day") + ggtitle("Daily Paperback Book Sales")
The simple exponential smoothing model for the sales of paperback books deviates from the data. The ses model shows much less variation from one day to the next than the actual data. The peaks and troughs in the data do not correspond to the peaks and troughs in the ses model. The 95% confidence interval for the prediction has a very large range, from about 82-275 books sold on day 31.
autoplot(ses_hc_books) +
autolayer(fitted(ses_hc_books), series="SES Model") +
ylab("Number of Harcover Books Sold at the Same Store") + xlab("Day") + ggtitle("Daily Hardcover Books Sales")
The simple exponential smoothing model for the sales of hardcover books deviates from the data. The ses model shows an upward trend that corresponds to the trend in the data. Again there is much less variation from one day to the next than the actual data. The peaks and troughs in the data do not correspond to the peaks and troughs in the ses model. The 95% confidence interval for the prediction is very wide, ranging from about 175-305 books sold on day 31.
pb_e <- tsCV(pbBooks, ses, h=4)
rmse_pb <- sqrt(mean(pb_e^2, na.rm=TRUE))
rmse_pb
## [1] 38.40918
The RMSE for the simple exponential smoothing model for paperback books is 38.4. The model’s prediction is off on average by about 38 paperback books sold each day.
hc_e <- tsCV(hcBooks, ses, h=4)
rmse_hc <- sqrt(mean(hc_e^2, na.rm=TRUE))
rmse_hc
## [1] 38.74066
The RMSE for the simple exponential smoothing model for hardcover books is 38.7. The model’s prediction is off on average by about 39 hardcover books sold each day.
7.6) a)Now apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.
fc_holt_pb <- holt(pbBooks, h=4)
fc_holt_hc <- holt(hcBooks, h=4)
autoplot(fc_holt_pb) +
autolayer(fitted(fc_holt_pb), series="Holt Model", PI=FALSE) +
autolayer(fitted(ses_paper_books), series="SES Model") +
ylab("Number of Paperback Books Sales at the Same Store") + xlab("Day") + ggtitle("Daily Paperback Book Sales")
The Holt exponential model incorporates a trend component. The Holt model for paperback book sales displays an increasing trend. The 95% confidence interval is very wide, and ranges from about 145-275 books sold on day 31. This 95% confidence interval is narrower than the one for the ses model.
autoplot(fc_holt_hc) +
autolayer(fitted(fc_holt_hc), series="Holt Model", PI=FALSE) +
autolayer(fitted(ses_hc_books), series="SES Model") +
ylab("Number of Hardcover Books Sales at the Same Store") + xlab("Day") + ggtitle("Daily Hardcover Book Sales")
The Holt model for hardcover book sales displays an increasing trend. The 95% confidence interval is wide, and ranges from about 190-310 books sold on day 31. This 95% confidence interval is a little narrower than the one for the ses model.
b+c) Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets. Compare the forecasts for the two series using both methods. Which do you think is best? (I combined my answers to questions b and c.)
pb_holt_e <- tsCV(pbBooks, holt, h=4)
rmse_holt_pb <- sqrt(mean(pb_holt_e^2, na.rm=TRUE))
rmse_holt_pb
## [1] 46.14786
The RMSE for paperback book sales is 46. This is higher than the RMSE for the simple exponential smoothing method.
hc_holt_e <- tsCV(hcBooks, holt, h=4)
rmse_holt_hc <- sqrt(mean(hc_holt_e^2, na.rm=TRUE))
rmse_holt_hc
## [1] 39.42814
The RMSE for hardcover book sales is 39. This is higher than the RMSE for the simple exponential smoothing method.
The root mean square errors are higher for the Holt model. The Holt model just follows the trend in the data. Even though the RMSE is higher, the 95% confidence interval for the Holt method is narrower and I believe it will do a better job in forecasting than the ses model.
ses_paper_books
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 207.1097 162.4882 251.7311 138.8670 275.3523
## 32 207.1097 161.8589 252.3604 137.9046 276.3147
## 33 207.1097 161.2382 252.9811 136.9554 277.2639
## 34 207.1097 160.6259 253.5935 136.0188 278.2005
ses_paper_books$mean[1]-1.96*rmse_pb
## [1] 131.8277
ses_paper_books$mean[1]+1.96*rmse_pb
## [1] 282.3917
The 95% prediction interval I calculated using the RMSE for paperback books with the ses method is from 132-282 books sold. The ses model’s 95% interval is between 139-275 books.
fc_holt_pb
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 209.4668 166.6035 252.3301 143.9130 275.0205
## 32 210.7177 167.8544 253.5811 145.1640 276.2715
## 33 211.9687 169.1054 254.8320 146.4149 277.5225
## 34 213.2197 170.3564 256.0830 147.6659 278.7735
fc_holt_pb$mean[1]-1.96*rmse_holt_pb
## [1] 119.017
fc_holt_pb$mean[1]+1.96*rmse_holt_pb
## [1] 299.9166
The 95% prediction interval I calculated using the RMSE for paperback books with the Holt method is from 119-299 books sold. The Holt model’s 95% interval is between 144-275 books.
ses_hc_books
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 239.5601 197.2026 281.9176 174.7799 304.3403
## 32 239.5601 194.9788 284.1414 171.3788 307.7414
## 33 239.5601 192.8607 286.2595 168.1396 310.9806
## 34 239.5601 190.8347 288.2855 165.0410 314.0792
ses_hc_books$mean[1]-1.96*rmse_hc
## [1] 163.6284
ses_hc_books$mean[1]+1.96*rmse_hc
## [1] 315.4918
The 95% prediction interval I calculated using the RMSE for hardcover books with the ses method is from 164-315 books sold. The ses model’s 95% interval is between 175-304 books.
fc_holt_hc
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 250.1739 212.7390 287.6087 192.9222 307.4256
## 32 253.4765 216.0416 290.9113 196.2248 310.7282
## 33 256.7791 219.3442 294.2140 199.5274 314.0308
## 34 260.0817 222.6468 297.5166 202.8300 317.3334
fc_holt_hc$mean[1]-1.96*rmse_holt_hc
## [1] 172.8947
fc_holt_hc$mean[1]+1.96*rmse_holt_hc
## [1] 327.453
The 95% prediction interval I calculated using the RMSE for hardcover books with the Holt method is from 172-327 books sold. The Holt model’s 95% interval is between 193-307 books.
In each case, the 95% confidence interval using the RMSE was larger than the confidence interval predicted by the model. Confidence intervals should be calculated using the standard error, which is not equivalent to the RMSE.
7.7) For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900-1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts. [Hint: use h=100 when calling holt() so you can clearly see the differences between the various options when plotting the forecasts.] Which model gives the best RMSE?
fc_holt_eggs <- holt(eggs, h=100)
fc_holt_eggs_damped <- holt(eggs, damped=TRUE, h=100)
fc_holt_eggs_damped[["model"]]
## Damped Holt's method
##
## Call:
## holt(y = eggs, h = 100, damped = TRUE)
##
## Smoothing parameters:
## alpha = 0.8462
## beta = 0.004
## phi = 0.8
##
## Initial states:
## l = 276.9842
## b = 4.9966
##
## sigma: 27.2755
##
## AIC AICc BIC
## 1055.458 1056.423 1070.718
fc_holt_eggs_boxcox <- holt(eggs, lambda=0.5, h=100)
autoplot(eggs) +
autolayer(fc_holt_eggs, series="Holt's method", PI=FALSE) +
autolayer(fc_holt_eggs_damped, series="Damped Holt's method phi=0.8", PI=FALSE) +
autolayer(fc_holt_eggs_boxcox, series="Box Cox", PI=FALSE) +
ggtitle("Forecasts from Holt's method") + xlab("Year") +
ylab("Price of a Dozen Eggs in the United States (dollars)") +
guides(colour=guide_legend(title="Forecast"))
eggs_holt_error <- tsCV(eggs, holt, h=100)
rmse_holt_eggs <- sqrt(mean(eggs_holt_error^2, na.rm=TRUE))
rmse_holt_eggs
## [1] 277.5396
eggs_damped_holt_error <- tsCV(eggs, holt, damped=TRUE, h=100)
rmse_holt_eggs_damped <- sqrt(mean(eggs_damped_holt_error^2, na.rm=TRUE))
rmse_holt_eggs_damped
## [1] 278.5824
eggs_boxcox_error <- tsCV(eggs, holt, lambda=0.5, h=100)
rmse_boxcox_eggs<- sqrt(mean(eggs_boxcox_error^2, na.rm=TRUE))
rmse_boxcox_eggs
## [1] 119.066
The forecast from Holt’s method maintains a downward trend that the data possesses. The forecast from the damped Holt’s method is a horizontal line, and appears to be the same as the naive prediction. The forecast from the Box Cox transformation of the Holt’s method is between the other two predictions. The root mean square error is significantly lower from the Box Cox transformation.
7-8) Recall your retail time series data (from Exercise 3 in Section 2.10). a) Why is multiplicative seasonality necessary for this series?
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349399C"],frequency=12, start=c(1982,4))
autoplot(myts) + ylab("Retail Clothing Sales") + ggtitle("New South Wales - Clothing Sales")
Multiplicative seasonality is necessary because the seasonal variation changes.
retail_hw <- hw(myts,seasonal="multiplicative", h=10)
retail_hw_damped <- hw(myts,seasonal="multiplicative", damped=TRUE, h=10)
autoplot(myts) +
autolayer(retail_hw, series="HW multiplicative forecasts", PI=FALSE) +
autolayer(retail_hw_damped, series="Damped HW multiplicative forecasts", PI=FALSE) +
xlab("Year") +
ylab("Retail Clothing Sales") +
ggtitle("New South Wales - Clothing Sales") +
guides(colour=guide_legend(title="Forecast"))
retailhw_error <- tsCV(myts, hw, seasonal="multiplicative", h=1)
rmse_retailhw <- sqrt(mean(retailhw_error^2, na.rm=TRUE))
rmse_retailhw
## [1] 14.18924
retailhw_error_damped <- tsCV(myts, hw, seasonal="multiplicative", damped=TRUE, h=1)
rmse_retailhw_damped<- sqrt(mean(retailhw_error_damped^2, na.rm=TRUE))
rmse_retailhw_damped
## [1] 14.4676
The root mean square error is slightly lower for the Holt Winter method that is not damped.
checkresiduals(retail_hw)
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' multiplicative method
## Q* = 67.421, df = 8, p-value = 1.6e-11
##
## Model df: 16. Total lags used: 24
There is no pattern in the residuals. The residuals are centered around zero and the distribution is nearly normal. For white noise, 95% of the spikes lie inside the boundary. About 91% of the residuals lie inside the boundary - this is close to indicating white noise and the residuals being uncorrelated.
myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)
fc_naive <- snaive(myts.train)
accuracy(fc_naive,myts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 9.007207 21.13832 16.58859 4.224080 7.494415 1.000000
## Test set 10.362500 21.50499 18.99583 2.771495 5.493632 1.145115
## ACF1 Theil's U
## Training set 0.5277855 NA
## Test set 0.7420700 0.3223094
The RMSE of the test set is 21.5 using the naive method.
fc_hw <- hw(myts.train,seasonal="multiplicative")
accuracy(fc_hw,myts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.4181467 12.62076 9.483714 -0.5602421 4.615519 0.5717011
## Test set -5.6475034 14.47224 10.911929 -1.8801266 3.191901 0.6577973
## ACF1 Theil's U
## Training set 0.06436742 NA
## Test set 0.54835802 0.2212264
The RMSE for the test set is 14.5, which is lower than the naive approach. The Holt’s Winter model is a better predictor.
7-9) For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?
lambda_retail <- BoxCox.lambda(myts.train)
boxcox_retail <- BoxCox(myts.train,lambda_retail)
boxcox_retail_test <- BoxCox(myts.test,lambda_retail)
autoplot(BoxCox(myts.train,lambda_retail)) + ggtitle("Box Cox Transformation of Retail Clothing Sales in New South Wales")
library(seasonal)
boxcox_ts = ts(boxcox_retail[1:138,1], start=c(1982,4), end=c(2010, 12), frequency = 12)
stl_retail <- stl(boxcox_ts, s.window="periodic", robust=TRUE)
autoplot(stl_retail)+ ggtitle("STL Decomposition of Box Cox Transformation of Retail Clothing Sales in New South Wales")
seasadj_retail <- seasadj(stl_retail)
fit <- ets(seasadj_retail, lambda=0)
fc<- forecast(fit)
fvar <- ((BoxCox(fc$upper,fit$lambda) -
BoxCox(fc$lower,fit$lambda))/qnorm(0.975)/2)^2
accuracy(fvar,boxcox_retail_test)
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set 7.010702 7.014896 7.010702 99.98535 99.98535 0.2301583 27.1074
The RMSE is 7.0. This is lower than the RMSE of other models.