library(fpp2)
library(ggplot2)
Consider the pigs series — the number of pigs slaughtered in Victoria each month.
Use the ses() function in R to find the optimal values of \(\alpha\) and \(l_0\), and generate forecasts for the next four months.
fc <- ses(pigs, h = 4)
fc$model
## Simple exponential smoothing
##
## Call:
## ses(y = pigs, h = 4)
##
## Smoothing parameters:
## alpha = 0.2971
##
## Initial states:
## l = 77260.0561
##
## sigma: 10308.58
##
## AIC AICc BIC
## 4462.955 4463.086 4472.665Compute a 95% prediction interval for the first forecast using \(\hat{y} \pm 1.96s\) where \(s\) is the standard deviation of the residuals. Compare your interval with the interval produced by R.
s <- sd(residuals(fc))
ci_95 <- c(Lower = fc$mean[1] - 1.96*s, Upper = fc$mean[1] + 1.96*s)
ci_95
## Lower Upper
## 78679.97 118952.84
The intervals produced by R are wider:
ci_95_R <- c(fc$lower[1,2], fc$upper[1,2])
names(ci_95_R) <- c("Lower", "Upper")
ci_95_R
## Lower Upper
## 78611.97 119020.84
The explanation given by Hyndman: “This is because R estimates the variance of the residuals differently, taking account of the degrees of freedom properly.”
Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.
Plot the series and discuss the main features of the data.
autoplot(books)
Use the ses() function to forecast each series, and plot the forecasts.
ses.fc_pb <- ses(books[,'Paperback'], h = 4)
ses.fc_hc <- ses(books[, 'Hardcover'], h = 4)
autoplot(books) +
autolayer(ses.fc_pb, series="Paperback", PI=FALSE) +
autolayer(ses.fc_hc, series="Hardcover", PI=FALSE)
Compute the RMSE values for the training data in each case.
ses.rmse_pb <- sqrt(mean(residuals(ses.fc_pb)^2))
ses.rmse_hc <- sqrt(mean(residuals(ses.fc_hc)^2))
RMSE [Paperback] = 33.64
RMSE [Hardcover] = 31.93
accuracy(ses.fc_pb)
## ME RMSE MAE MPE MAPE MASE
## Training set 7.175981 33.63769 27.8431 0.4736071 15.57784 0.7021303
## ACF1
## Training set -0.2117522
accuracy(ses.fc_hc)
## ME RMSE MAE MPE MAPE MASE
## Training set 9.166735 31.93101 26.77319 2.636189 13.39487 0.7987887
## ACF1
## Training set -0.1417763We will continue with the daily sales of paperback and hardcover books in data set books.
Apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.
h.fc_pb <- holt(books[,'Paperback'], h = 4)
h.fc_hc <- holt(books[, 'Hardcover'], h = 4)
autoplot(books) +
autolayer(h.fc_pb, series="Paperback", PI=FALSE) +
autolayer(h.fc_hc, series="Hardcover", PI=FALSE)
Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.
accuracy(h.fc_pb)
## ME RMSE MAE MPE MAPE MASE
## Training set -3.717178 31.13692 26.18083 -5.508526 15.58354 0.6602122
## ACF1
## Training set -0.1750792
accuracy(h.fc_hc)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.1357882 27.19358 23.15557 -2.114792 12.1626 0.6908555
## ACF1
## Training set -0.03245186
The RMSE measures of Holt’s method in both cases are lower to those of SES method. This is not surprising as the data clearly indicate a positive trend and therefore Holt’s method is expected to be more accurate given that it takes into account the trend component.
Compare the forecasts for the two series using both methods. Which do you think is best?
The forcasts [shown below] by the Holt’s method are higher and therefore best, given that the data clearly has a positive linear trend.
fc <- data.frame(Paperback=c(ses.fc_pb$mean[1], h.fc_pb$mean[1]),
Hardcover=c(ses.fc_hc$mean[1], h.fc_hc$mean[1]),
row.names = c("SES", "HOLT"))
fc
## Paperback Hardcover
## SES 207.1097 239.5601
## HOLT 209.4668 250.1739Calculate a \(95\%\) prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.
The below table compares the prediction interval using RMSE, with those intervals produced by R using ses and holt.
s <- accuracy(h.fc_pb)[1,"RMSE"]
r.ci_95_pb <- round(c(Lower = h.fc_pb$mean[1] - 1.96*s, Upper = h.fc_pb$mean[1] + 1.96*s), 2)
s <- accuracy(h.fc_hc)[1,"RMSE"]
r.ci_95_hc <- round(c(Lower = h.fc_hc$mean[1] - 1.96*s, Upper = h.fc_hc$mean[1] + 1.96*s), 2)
s.ci_95_pb <- round(c(ses.fc_pb$lower[1,2], ses.fc_pb$upper[1,2]), 2)
names(s.ci_95_pb) <- c("Lower", "Upper")
s.ci_95_hc <- round(c(ses.fc_hc$lower[1,2], ses.fc_hc$upper[1,2]), 2)
names(s.ci_95_hc) <- c("Lower", "Upper")
h.ci_95_pb <- round(c(h.fc_pb$lower[1,2], h.fc_pb$upper[1,2]), 2)
names(h.ci_95_pb) <- c("Lower", "Upper")
h.ci_95_hc <- round(c(h.fc_hc$lower[1,2], h.fc_hc$upper[1,2]), 2)
names(h.ci_95_hc) <- c("Lower", "Upper")
df.ci_95_R <- data.frame(Paperback=c(paste("(",r.ci_95_pb["Lower"],",",r.ci_95_pb["Upper"],")"),
paste("(",s.ci_95_pb["Lower"],",",s.ci_95_pb["Upper"],")"),
paste("(",h.ci_95_pb["Lower"],",",h.ci_95_pb["Upper"],")")),
Hardcover=c(paste("(",r.ci_95_hc["Lower"],",",r.ci_95_hc["Upper"],")"),
paste("(",s.ci_95_hc["Lower"],",",s.ci_95_hc["Upper"],")"),
paste("(",h.ci_95_hc["Lower"],",",h.ci_95_hc["Upper"],")")),
row.names = c("RMSE", "SES", "HOLT"))
df.ci_95_R
## Paperback Hardcover
## RMSE ( 148.44 , 270.5 ) ( 196.87 , 303.47 )
## SES ( 138.87 , 275.35 ) ( 174.78 , 304.34 )
## HOLT ( 143.91 , 275.02 ) ( 192.92 , 307.43 )For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900–1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.
[Hint: use h=100 when calling holt() so you can clearly see the differences between the various options when plotting the forecasts.]
Which model gives the best RMSE?
fc.1 <- holt(eggs, h = 100)
autoplot(eggs) +
autolayer(fc.1, series="Holt", PI=TRUE)
fc.1$model
## Holt's method
##
## Call:
## holt(y = eggs, h = 100)
##
## Smoothing parameters:
## alpha = 0.8124
## beta = 1e-04
##
## Initial states:
## l = 314.7232
## b = -2.7222
##
## sigma: 27.1665
##
## AIC AICc BIC
## 1053.755 1054.437 1066.472
accuracy(fc.1)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.04499087 26.58219 19.18491 -1.142201 9.653791 0.9463626
## ACF1
## Training set 0.01348202
fc.2 <- holt(eggs, h = 100, damped = TRUE)
autoplot(eggs) +
autolayer(fc.2, series="Damped", PI=TRUE)
fc.2$model
## Damped Holt's method
##
## Call:
## holt(y = eggs, h = 100, damped = TRUE)
##
## Smoothing parameters:
## alpha = 0.8462
## beta = 0.004
## phi = 0.8
##
## Initial states:
## l = 276.9842
## b = 4.9966
##
## sigma: 27.2755
##
## AIC AICc BIC
## 1055.458 1056.423 1070.718
accuracy(fc.2)
## ME RMSE MAE MPE MAPE MASE
## Training set -2.891496 26.54019 19.2795 -2.907633 10.01894 0.9510287
## ACF1
## Training set -0.003195358
lambda <- round(BoxCox.lambda(eggs), 2)
fc.3 <- holt(eggs, h = 100, lambda = "auto")
autoplot(eggs) +
autolayer(fc.3, series="BoxCox", PI=TRUE)
fc.3$model
## Holt's method
##
## Call:
## holt(y = eggs, h = 100, lambda = "auto")
##
## Box-Cox transformation: lambda= 0.3956
##
## Smoothing parameters:
## alpha = 0.809
## beta = 1e-04
##
## Initial states:
## l = 21.0322
## b = -0.1144
##
## sigma: 1.0549
##
## AIC AICc BIC
## 443.0310 443.7128 455.7475
accuracy(fc.3)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.7736844 26.39376 18.96387 -1.072416 9.620095 0.9354593
## ACF1
## Training set 0.03887152
fc.4 <- holt(eggs, h = 100, damped = TRUE, lambda = "auto")
autoplot(eggs) +
autolayer(fc.4, series="BoxCox + Damped", PI=TRUE)
fc.4$model
## Damped Holt's method
##
## Call:
## holt(y = eggs, h = 100, damped = TRUE, lambda = "auto")
##
## Box-Cox transformation: lambda= 0.3956
##
## Smoothing parameters:
## alpha = 0.8356
## beta = 1e-04
## phi = 0.98
##
## Initial states:
## l = 21.6922
## b = -0.1429
##
## sigma: 1.068
##
## AIC AICc BIC
## 446.2962 447.2617 461.5560
accuracy(fc.4)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.8200445 26.53321 19.45654 -2.019718 9.976131 0.9597618
## ACF1
## Training set 0.005852382
It appears that among the above various models, then one with only Box-Cox tranformation (and \(\lambda \approx0.4\)) has the best (lowest) RMSE.
Recall your retail time series data (from Exercise 3 in Section 2.10).
Why is multiplicative seasonality necessary for this series?
retaildata <- readxl::read_excel("retail.xlsx", skip = 1)
myts <- ts(retaildata[,"A3349398A"], frequency=12, start=c(1982,4))
autoplot(myts)
When the seasonal variations are changing proportionally to the level of the series (as evident in the graph), the use of the multiplicative seasonality component in modelling is a preferred choice and a more effective choice.
Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.
hw.fc.1 <- hw(myts, seasonal = "multiplicative")
autoplot(hw.fc.1)
hw.fc.2 <- hw(myts, seasonal = "multiplicative", damped = TRUE)
autoplot(hw.fc.2)
Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?
accuracy(hw.fc.1)
## ME RMSE MAE MPE MAPE MASE
## Training set 1.496135 29.43051 22.25676 0.1603693 1.799731 0.2926361
## ACF1
## Training set -0.0300701
accuracy(hw.fc.2)
## ME RMSE MAE MPE MAPE MASE
## Training set 4.244765 29.63087 22.26018 0.2959731 1.785469 0.292681
## ACF1
## Training set -0.2184672
Based on the [slightly] lower RMSE value, the non-damped method would be preferred. This is not surprising, given that the trend is near linear and we would expect it to continue this way.
Check that the residuals from the best method look like white noise.
checkresiduals(hw.fc.1)
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' multiplicative method
## Q* = 244.86, df = 8, p-value < 2.2e-16
##
## Model df: 16. Total lags used: 24
Even though the histogram looks to be nearly normal, there are some significant auto-correlation [ACF] levels, which would indicate that the residuals are not entirely appearing as white noise.
Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 8 in Section 3.7?
myts.train <- window(myts, end = c(2010, 12))
myts.test <- window(myts, start=2011)
hw.fc.train <- hw(myts.train, seasonal = "multiplicative")
hw.fc.a <- accuracy(hw.fc.train, x = myts)
hw.fc.a
## ME RMSE MAE MPE MAPE MASE
## Training set 2.594132 29.87520 22.43516 0.2383033 1.864671 0.2985975
## Test set -29.588194 49.09806 38.39804 -1.1062616 1.458304 0.5110531
## ACF1 Theil's U
## Training set -0.03687893 NA
## Test set 0.09931874 0.267754
sn.fc.train <- snaive(myts.train)
sn.fc.a <- accuracy(sn.fc.train, myts.test)
sn.fc.a
## ME RMSE MAE MPE MAPE MASE
## Training set 73.94114 88.31208 75.13514 6.068915 6.134838 1.000000
## Test set 115.00000 127.92727 115.00000 4.459712 4.459712 1.530576
## ACF1 Theil's U
## Training set 0.6312891 NA
## Test set 0.2653013 0.7267171
The RMSE value of 49.1 from the Holt-Winters’ method is a lot better than 127.93 from the Seasonal Naïve approach.
For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?
stlf.fc <- stlf(myts.train, lambda = "auto")
stlf.fc.a <- accuracy(stlf.fc, myts.test)
stlf.fc.a
## ME RMSE MAE MPE MAPE MASE
## Training set -1.077737 26.34835 19.95571 -0.020850 1.675349 0.2655976
## Test set -101.367987 117.88372 101.84801 -3.846709 3.865079 1.3555310
## ACF1 Theil's U
## Training set -0.04737883 NA
## Test set 0.52778217 0.656884
The stlf function will use the ETS approach applied to the seasonally adjusted series if its method parameter is not specified. Given that the lambda parameter is set to “auto”, the (optimal) Box-Cox tranformation will be performed.
The RMSE value of 49.1 from the Holt-Winters’ method is still a lot better than 117.88 from this STL approach.