DATA 624 Homework #5

Exercise 7.1: `pigs` time series

autoplot(pigs) + 
  labs(title = "Monthly slaughter of pigs in Victoria", 
       subtitle = "Jan 1980 - Aug 1995",
       x = "Year", y = "")

a. Parameters and forecast

We use the ses function to fit a simple exponential smoothing model to the time series. Estimated parameter values are output in the model summary below:

\(\alpha = 0.2971\)
\(l_0 = 77260\)

The model summary also includes forecast values and prediction intervals for the next four months.

fc <- ses(pigs, h = 4)
sumfc <- summary(fc)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = pigs, h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.2971 
## 
##   Initial states:
##     l = 77260.0561 
## 
##   sigma:  10308.58
## 
##      AIC     AICc      BIC 
## 4462.955 4463.086 4472.665 
## 
## Error measures:
##                    ME    RMSE      MAE       MPE     MAPE      MASE
## Training set 385.8721 10253.6 7961.383 -0.922652 9.274016 0.7966249
##                    ACF1
## Training set 0.01282239
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Sep 1995       98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995       98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995       98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995       98816.41 83958.37 113674.4 76092.99 121539.8

We plot the forecast and fitted values together with the original time series below.

autoplot(pigs) + 
  autolayer(fitted(fc), series = "Fitted values") + 
  autolayer(fc, series = "Forecast", PI = FALSE) + 
  labs(title = "Monthly slaughter of pigs in Victoria", 
       subtitle = "Jan 1980 - Aug 1995",
       x = "Year", y = "")

b. Prediction interval

First we compute the standard deviation of the residuals. From the text (equation 5.3), we know that this can be computed as:

\[\hat{\sigma}_e = \sqrt{\frac{1}{T-k-1} \sum_{t=1}^{T} e_t^2}\]

where \(k+1\) is the number of estimated parameters. In this case, there are 2 estimated parameters, so we use \(T-2\) in the denominator above. Then we can compute the residual standard error:

\(\sigma\) = 10308.6

# manually compute rse
s1 <- sqrt(sum(residuals(fc)^2) / (length(residuals(fc)) - 2))
# get rse from model
s2 <- sqrt(fc$model$sigma2)
c("Calculated" = s1, "From model summary" = s2)

##         Calculated From model summary 
##           10308.58           10308.58

Note that this agrees with the residual standard error as output in the model summary above. Next we use \(\hat{y} \pm 1.96 s\) for the 95% prediction interval for the first month:

# 95% prediction interval
fc$mean[1] + c(Lower95 = -1, Upper95 = 1) * 1.96 * s1

##  Lower95  Upper95 
##  78611.6 119021.2

fc$mean[1] + c(Lower95 = -1, Upper95 = 1) * qnorm(0.975) * s1

##   Lower95   Upper95 
##  78611.97 119020.84

The first prediction interval is slightly different from the 95% prediction interval in the model summary above. However, once we use a more precise estimate for the critical value from the normal distribution (1.959964), the prediction interval is identical to the output in the model summary.

Exercise 7.5: `books` dataset

a. Time series plots

The paperback and hardcover time series are plotted below. Main features of the time series include:

Trend: both times series exhibit a steady upward trend
Cycle: there appears to be some kind of cyclical pattern in both time series
Seasonality: there doesn’t appear to be any seasonal pattern (i.e., with a fixed period).

autoplot(books) + 
  labs(title = "Daily same-store sales of paperback & hardcover books", 
       x = "Day", y = "")

b. 4-day forecasts

Although the time series shows an upward trend, we start with a simple exponential smoothing model. The 4-day forecasts are shown in the model summaries and are plotted below.

fc1 <- ses(books[ , "Paperback"], h = 4)
fc2 <- ses(books[ , "Hardcover"], h = 4)
summary(fc1)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = books[, "Paperback"], h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.1685 
## 
##   Initial states:
##     l = 170.8271 
## 
##   sigma:  34.8183
## 
##      AIC     AICc      BIC 
## 318.9747 319.8978 323.1783 
## 
## Error measures:
##                    ME     RMSE     MAE       MPE     MAPE      MASE
## Training set 7.175981 33.63769 27.8431 0.4736071 15.57784 0.7021303
##                    ACF1
## Training set -0.2117522
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       207.1097 162.4882 251.7311 138.8670 275.3523
## 32       207.1097 161.8589 252.3604 137.9046 276.3147
## 33       207.1097 161.2382 252.9811 136.9554 277.2639
## 34       207.1097 160.6259 253.5935 136.0188 278.2005

summary(fc2)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = books[, "Hardcover"], h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.3283 
## 
##   Initial states:
##     l = 149.2861 
## 
##   sigma:  33.0517
## 
##      AIC     AICc      BIC 
## 315.8506 316.7737 320.0542 
## 
## Error measures:
##                    ME     RMSE      MAE      MPE     MAPE      MASE
## Training set 9.166735 31.93101 26.77319 2.636189 13.39487 0.7987887
##                    ACF1
## Training set -0.1417763
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       239.5601 197.2026 281.9176 174.7799 304.3403
## 32       239.5601 194.9788 284.1414 171.3788 307.7414
## 33       239.5601 192.8607 286.2595 168.1396 310.9806
## 34       239.5601 190.8347 288.2855 165.0410 314.0792

autoplot(books) + 
  autolayer(fc1, PI = FALSE, series = "Paperback") + 
  autolayer(fc2, PI = FALSE, series = "Hardcover") + 
  labs(title = "Daily same-store sales of paperback & hardcover books", 
       subtitle = "4-day forecasts with simple exponential smoothing",
       x = "Day", y = "")

c. Training RMSE

We compute the RMSE for each series on the training set.

rmse1 <- sqrt(mean(residuals(fc1)^2))
rmse2 <- sqrt(mean(residuals(fc2)^2))
c(Paperback = rmse1, Hardcover = rmse2)

## Paperback Hardcover 
##  33.63769  31.93101

We can verify these calculated values using the accuracy function, which shows that the results agree.

accuracy(fc1)

##                    ME     RMSE     MAE       MPE     MAPE      MASE
## Training set 7.175981 33.63769 27.8431 0.4736071 15.57784 0.7021303
##                    ACF1
## Training set -0.2117522

accuracy(fc2)

##                    ME     RMSE      MAE      MPE     MAPE      MASE
## Training set 9.166735 31.93101 26.77319 2.636189 13.39487 0.7987887
##                    ACF1
## Training set -0.1417763

Exercise 7.6: `books` dataset

a. Holt’s linear method

We use the holt function to create linear trend models for both time series. 4-day forecasts are shown in the model summaries and ploted below.

fc3 <- holt(books[ , "Paperback"], h = 4)
fc4 <- holt(books[ , "Hardcover"], h = 4)
summary(fc3)

## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = books[, "Paperback"], h = 4) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 170.699 
##     b = 1.2621 
## 
##   sigma:  33.4464
## 
##      AIC     AICc      BIC 
## 318.3396 320.8396 325.3456 
## 
## Error measures:
##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -3.717178 31.13692 26.18083 -5.508526 15.58354 0.6602122
##                    ACF1
## Training set -0.1750792
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       209.4668 166.6035 252.3301 143.9130 275.0205
## 32       210.7177 167.8544 253.5811 145.1640 276.2715
## 33       211.9687 169.1054 254.8320 146.4149 277.5225
## 34       213.2197 170.3564 256.0830 147.6659 278.7735

summary(fc4)

## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = books[, "Hardcover"], h = 4) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 147.7935 
##     b = 3.303 
## 
##   sigma:  29.2106
## 
##      AIC     AICc      BIC 
## 310.2148 312.7148 317.2208 
## 
## Error measures:
##                      ME     RMSE      MAE       MPE    MAPE      MASE
## Training set -0.1357882 27.19358 23.15557 -2.114792 12.1626 0.6908555
##                     ACF1
## Training set -0.03245186
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       250.1739 212.7390 287.6087 192.9222 307.4256
## 32       253.4765 216.0416 290.9113 196.2248 310.7282
## 33       256.7791 219.3442 294.2140 199.5274 314.0308
## 34       260.0817 222.6468 297.5166 202.8300 317.3334

autoplot(books) + 
  autolayer(fc1, PI = FALSE, series = "Paperback - Simple") + 
  autolayer(fc2, PI = FALSE, series = "Hardcover - Simple") + 
  autolayer(fc3, PI = FALSE, series = "Paperback - Holt") + 
  autolayer(fc4, PI = FALSE, series = "Hardcover - Holt") + 
  labs(title = "Daily same-store sales of paperback & hardcover books", 
       subtitle = "4-day forecasts with linear trend model & simple exponential smoothing",
       x = "Day", y = "")

b. Training RMSE

We calculate the RMSE for both models and check the results using the accuracy function.

rmse3 <- sqrt(mean(residuals(fc3)^2))
rmse4 <- sqrt(mean(residuals(fc4)^2))
c(Paperback = rmse3, Hardcover = rmse4)

## Paperback Hardcover 
##  31.13692  27.19358

accuracy(fc3)

##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -3.717178 31.13692 26.18083 -5.508526 15.58354 0.6602122
##                    ACF1
## Training set -0.1750792

accuracy(fc4)

##                      ME     RMSE      MAE       MPE    MAPE      MASE
## Training set -0.1357882 27.19358 23.15557 -2.114792 12.1626 0.6908555
##                     ACF1
## Training set -0.03245186

Comparing the simple exponential smoothing and linear trend models, we see that the linear trend model produces lower training RMSEs for both time series. This is to be expected since both time series show a consistent upward trend, and the linear model includes a parameter to account for this trend. We also expect that the linear trend model would generate more accurate forecasts outside the training data.

rbind(Simple = c(rmse1, rmse2), Holt = c(rmse3, rmse4)) %>%
  kable(caption = "RMSE comparison", 
        col.names= c("Paperback", "Hardcover"))

RMSE comparison
	Paperback	Hardcover
Simple	33.63769	31.93101
Holt	31.13692	27.19358

c. Comparison of forecasts

4-day forecasts of the books time series using both methods is plotted in part (a) above, and is also summarized below. Holt’s linear trend model is clearly preferable in this case, since both time series exhibit obvious upward trends.

cbind(Day = 1:4, 
      "Paper-Simple" = fc1$mean, 
      "Paper-Holt" = fc3$mean, 
      "Hard-Simple" = fc2$mean, 
      "Hard-Holt" = fc4$mean) %>% 
  kable(caption = "Forecast comparison")

Forecast comparison
Day	Paper-Simple	Paper-Holt	Hard-Simple	Hard-Holt
1	207.1097	209.4668	239.5601	250.1739
2	207.1097	210.7177	239.5601	253.4765
3	207.1097	211.9687	239.5601	256.7791
4	207.1097	213.2197	239.5601	260.0817

d. Prediction interval

We compute 1-day prediction intervals using the RMSEs calculated above and assuming normally distributed errors (“FromRMSE”), and compare to the prediction intervals as reported in the model summaries (“FromModel”). We also compute the prediction intervals more precisely using the residual standard errors from the model outputs (“FromRSE”), which account for the degrees of freedom. The results are summarized below.

# simple approximation using rmse
approx1 <- fc1$mean[1] + c(-1, 1) * 1.96 * rmse1
approx2 <- fc2$mean[1] + c(-1, 1) * 1.96 * rmse2
approx3 <- fc3$mean[1] + c(-1, 1) * 1.96 * rmse3
approx4 <- fc4$mean[1] + c(-1, 1) * 1.96 * rmse4

# more precise estimate using rse
est1 <- fc1$mean[1] + c(-1, 1) * qnorm(0.975) * sqrt(fc1$model$sigma2)
est2 <- fc2$mean[1] + c(-1, 1) * qnorm(0.975) * sqrt(fc2$model$sigma2)
est3 <- fc3$mean[1] + c(-1, 1) * qnorm(0.975) * sqrt(fc3$model$sigma2)
est4 <- fc4$mean[1] + c(-1, 1) * qnorm(0.975) * sqrt(fc4$model$sigma2)

rbind("Simple-FromModel" = c(fc1$lower[1,2], fc1$upper[1,2], fc2$lower[1,2], fc2$upper[1,2]), 
      "Simple-FromRMSE" = c(approx1, approx2),
      "Simple-FromRSE" = c(est1, est2),
      "Holt-FromModel" = c(fc3$lower[1,2], fc3$upper[1,2], fc4$lower[1,2], fc4$upper[1,2]), 
      "Holt-FromRMSE" = c(approx3, approx4), 
      "Holt-FromRSE" = c(est3, est4)) %>%
  kable(caption = "Comparison of 95% prediction intervals", 
        col.names= c("Paper-Lower", "Paper-Upper", "Hard-Lower", "Hard-Upper"))

Comparison of 95% prediction intervals
	Paper-Lower	Paper-Upper	Hard-Lower	Hard-Upper
Simple-FromModel	138.8670	275.3523	174.7799	304.3403
Simple-FromRMSE	141.1798	273.0395	176.9753	302.1449
Simple-FromRSE	138.8670	275.3523	174.7799	304.3403
Holt-FromModel	143.9130	275.0205	192.9222	307.4256
Holt-FromRMSE	148.4384	270.4951	196.8745	303.4733
Holt-FromRSE	143.9130	275.0205	192.9222	307.4256

As expected, there are slight differences between the model output and the estimates using the RMSE values; however, once we use the RSE instead of the RMSE, our prediction intervals are identical to those from the model output.

Exercise 7.7: `eggs` time series

First we plot the time series. It is evident that there is a strong downward trend, which we will account for using Holt’s linear trend model. Since we will consider a Box-Cox transformation when we experiment with the linear trend models, we also show the transformed time series on the same plot. The optimal parameter for the Box-Cox transformation is \(\lambda = 0.40\).

BoxCox.lambda(eggs)

## [1] 0.3956183

cbind(Original = eggs, "Box-Cox" = BoxCox(eggs, lambda = "auto")) %>%
  autoplot(facet = TRUE) + 
  labs(title = "U.S. price of one dozen eggs (constant dollars)", 
       subtitle = "1900-1993", 
       x = "Year", y = "")

We consider 6 options for using the linear trend model:

Basic linear trend model (“Holt”)
Linear trend model with Box-Cox transformation (“BoxCox”)
Linear trend model with Box-Cox transformation with bias adjustment (“BC-bias”)
Linear trend model with damped trend (“Damped”)
Linear trend model with Box-Cox transformation and damped trend (“Damped-BC”)
Linear trend model with Box-Cox transformation with bias adjustment and damped trend (“Damped-BC-bias”).

In all cases, the holt function will estimate the optimal choice of the parameters \(\alpha\) and \(\beta\) (for all models), as well as \(\phi\) (for the models with damped trend). We set \(h=100\) in order to clearly see the differences in the forecasts produced by the four models. From the plot below, we see immediately that the Holt model shouldn’t be used, since it projects prices that go negative.

# basic holt
eggsfc1 <- holt(eggs, h = 100)
# holt with box-cox
eggsfc2 <- holt(eggs, h = 100,
                lambda = "auto", biasadj = FALSE)
# holt with box-cox with bias-adjustment
eggsfc2a <- holt(eggs, h = 100,
                lambda = "auto", biasadj = TRUE)
# holt with damped trend
eggsfc3 <- holt(eggs, h = 100, 
                damped = TRUE)
# holt with box-cox & damped trend
eggsfc4 <- holt(eggs, h = 100,
                lambda = "auto", biasadj = FALSE, 
                damped = TRUE)
# holt with box-cox with bias-adjustment & damped trend
eggsfc4a <- holt(eggs, h = 100,
                lambda = "auto", biasadj = TRUE, 
                damped = TRUE)

# plot all forecasts
autoplot(eggs) + 
  autolayer(eggsfc1, series = "Holt", PI = FALSE, lty = 1, lwd = 1.5) + 
  autolayer(eggsfc2, series = "BoxCox", PI = FALSE, lty = 2, lwd = 1.5) + 
  autolayer(eggsfc2a, series = "BC-bias", PI = FALSE, lty = 3, lwd = 1.5) + 
  autolayer(eggsfc3, series = "Damped", PI = FALSE, lty = 1, lwd = 1.5) + 
  autolayer(eggsfc4, series = "Damped-BC", PI = FALSE, lty = 2, lwd = 1.5) + 
  autolayer(eggsfc4a, series = "Damped-BC-bias", PI = FALSE, lty = 3, lwd = 1.5) + 
  labs(title = "U.S. price of one dozen eggs (constant dollars)", 
       subtitle = "1900-1993 with 100 year forecast",
       x = "Year", y = "") + 
  geom_hline(yintercept = 0, lty = 3, lwd = 1)

Taking a closer look, we observe that:

The basic “Holt” model forecasts prices with a strong downward trend, which goes negative by 2016.
The “BoxCox” and “BC-bias” models both generate forecasts that trend downward over time, while avoiding the issue of negative prices.
On the other hand, the damped models all moderate / weaken the downward trend. In fact, the “Damped-BC-bias” model generates forecasts that trend upward!

# take a closer look
autoplot(eggs) + 
  autolayer(eggsfc1, series = "Holt", PI = FALSE, lty = 1, lwd = 1.5) + 
  autolayer(eggsfc2, series = "BoxCox", PI = FALSE, lty = 2, lwd = 1.5) + 
  autolayer(eggsfc2a, series = "BC-bias", PI = FALSE, lty = 3, lwd = 1.5) + 
  autolayer(eggsfc3, series = "Damped", PI = FALSE, lty = 1, lwd = 1.5) + 
  autolayer(eggsfc4, series = "Damped-BC", PI = FALSE, lty = 2, lwd = 1.5) + 
  autolayer(eggsfc4a, series = "Damped-BC-bias", PI = FALSE, lty = 3, lwd = 1.5) + 
  labs(title = "U.S. price of one dozen eggs (constant dollars)", 
       subtitle = "1960-1993 with 30 year forecast",
       x = "Year", y = "") + 
  geom_hline(yintercept = 0, lty = 3, lwd = 1) + 
  coord_cartesian(xlim = c(1960, 2023), 
                  ylim = c(-50, 200))

Based on the RMSE metrics below, the linear model with Box-Cox transformation and bias-adjustment performs the best, with RMSE = 26.39.

c(Holt = accuracy(eggsfc1)[2], 
  BoxCox = accuracy(eggsfc2)[2], 
  "BC-bias" = accuracy(eggsfc2a)[2], 
  Damped = accuracy(eggsfc3)[2], 
  "Damped-BC" = accuracy(eggsfc4)[2], 
  "Damped-BC-bias" = accuracy(eggsfc4a)[2])

##           Holt         BoxCox        BC-bias         Damped      Damped-BC 
##       26.58219       26.39376       26.38689       26.54019       26.53321 
## Damped-BC-bias 
##       26.58589

Exercise 7.8: Australian retail data

For the Australian retail data, I chose the 7th variable column in the retail dataset, which relates to “Turnover - New South Wales - Hardware, building and garden supplies retailing”. Let’s plot the time series.

retaildata <- readxl::read_excel("../Week2/retail.xlsx", skip=1)
colID <- colnames(retaildata)[8]
myts <- ts(retaildata[ , colID], frequency=12, start=c(1982,4))

autoplot(myts) + 
  labs(title = "Turnover - NSW - Hardware, building & garden supplies retailing", 
       subtitle = "Monthly 1982/04 - 2013/12", 
       x = "Year", y = "")

a. Seasonality

From the times series plot, it appears that the amplitude of the seasonal pattern scales with the level of the time series. This suggests that multiplicative seasonality will be more effective in modeling the time series.

b. Holt-Winters

We use the hw function to fit Holt-Winters seasonal models (with multiplicative seasonality), both with and without damping. We plot the original data along with 24-month forecasts from both models.

retailfc1 <- hw(myts, seasonal = "multiplicative")
retailfc2 <- hw(myts, seasonal = "multiplicative", damped = TRUE)

autoplot(myts) + 
  autolayer(retailfc1, series = "HW", PI = FALSE) + 
  autolayer(retailfc2, series = "Damped", PI = FALSE) + 
  labs(title = "Turnover - NSW - Hardware, building & garden supplies retailing", 
       subtitle = "Time series with 2-year forecast", 
       x = "Year", y = "")

Below we display the forecasts in close-up to see the differences between the two models. Overall, the two models produce forecasts that are very close to each other, and both forecasts do a good job of replicating the historical seasonality of the time series.

# plot close-up
autoplot(myts) + 
  autolayer(retailfc1, series = "HW", PI = FALSE) + 
  autolayer(retailfc2, series = "Damped", PI = FALSE) + 
  labs(title = "Turnover - NSW - Hardware, building & garden supplies retailing", 
       subtitle = "Close-up: 2011-2013 with 2-year forecast ", 
       x = "Year", y = "") + 
  geom_vline(xintercept = 2014, lty = 3, lwd = 1) + 
  coord_cartesian(xlim = c(2011, 2016), 
                  ylim = c(250, 450))

c. Training RMSE

Over the training set, the training RMSEs of both models are very close, although the damped model RMSE is slightly lower (13.15) than the Holt-Winters model RMSE (13.17).

c(HW = accuracy(retailfc1)[2], Damped = accuracy(retailfc2)[2])

##       HW   Damped 
## 13.17456 13.15306

#summary(retailfc1)
#summary(retailfc2)

Let’s compare some other metrics between the two models, including the residual standard error, AIC, corrected AIC, and BIC. Across all these metrics, the damped Holt-Winters model is stronger than the basic Holt-Winters model.

hw <- c(accuracy(retailfc1)[2], sqrt(retailfc1$model$sigma2), 
        retailfc1$model$aic, retailfc1$model$aicc, retailfc1$model$bic)  
damped <- c(accuracy(retailfc2)[2], sqrt(retailfc2$model$sigma2), 
            retailfc2$model$aic, retailfc2$model$aicc, retailfc2$model$bic)  
metrics <- round(cbind(HW = hw, Damped = damped), 3)
rownames(metrics) <- c("RMSE", "RSE", "AIC", "AICc", "BIC")
kable(metrics, 
      caption = "Comparison of performance metrics")

Comparison of performance metrics
	HW	Damped
RMSE	13.175	13.153
RSE	0.086	0.080
AIC	4326.553	4266.723
AICc	4328.239	4268.612
BIC	4393.580	4337.693

d. Residuals

We use the checkresiduals function to check the residuals of the damped Holt-Winters model. Unfortunately, the residuals do not resemble white noise, and in fact, they exhibit strong auto-correlation with 1, 6, and 12-month lags. Further, the Ljung-Box test confirms that the residuals do not appear to be white noise.

checkresiduals(retailfc1)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' multiplicative method
## Q* = 143.08, df = 8, p-value < 2.2e-16
## 
## Model df: 16.   Total lags used: 24

checkresiduals(retailfc2)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt-Winters' multiplicative method
## Q* = 150.7, df = 7, p-value < 2.2e-16
## 
## Model df: 17.   Total lags used: 24

e. Test RMSE

We train the model on the data through the end of 2010, and then evaluate the forecast on the timeframe 2011-2013. We compare the training and test RMSE of the damped Holt-Winters model against the seasonal naive model.

# training set: thru 2010/12
myts.train <- window(myts, end = c(2010, 12))
# HW model: forecast 36 months through 2013/12 (end of time series)
hw.fc <- hw(myts.train, 
            seasonal = "multiplicative", 
            damped = TRUE,
            h = 36)
hw.rmse <- accuracy(hw.fc, myts)[ , 2]

# seasonal naive model
snaive.fc <- snaive(myts.train, h=36)
snaive.rmse <- accuracy(snaive.fc, myts)[ , 2]

# rmse
cbind("Damped HW" = hw.rmse, "Seas. naive" = snaive.rmse) %>%
  kable(caption = "RMSE comparison")

RMSE comparison
	Damped HW	Seas. naive
Training set	13.14634	26.30758
Test set	23.10869	20.91121

Based on the RMSE metrics above, the damped Holt-Winters model performs much better than the seasonal naive model on the training set (RMSE of 13.1 vs. 26.3) but performs worse on the test set (23.1 vs. 20.9). Let’s inspect the forecasts against the dataset in the plots below. It’s hard to distinguish forecasts from the models, so we also take a close-up of the forecasts.

# plot
autoplot(myts) + 
  autolayer(hw.fc, series = "Damped-HW", PI = FALSE) + 
  autolayer(snaive.fc, series = "Seas-Naive", PI = FALSE) + 
  labs(title = "Turnover - NSW - Hardware, building & garden supplies retailing", 
       subtitle = "1982-2013 with 3-year forecast", 
       x = "Year", y = "") + 
  geom_vline(xintercept = 2011, lty = 3, lwd = 1)

# close-up plot
autoplot(myts) + 
  autolayer(hw.fc, series = "Damped-HW", PI = FALSE) + 
  autolayer(snaive.fc, series = "Seas-Naive", PI = FALSE) + 
  autolayer(fitted(hw.fc), series = "DampedHW") + 
  autolayer(fitted(snaive.fc), series = "SeasNaive") + 
  labs(title = "Turnover - NSW - Hardware, building & garden supplies retailing", 
       subtitle = "Close-up: 2008-2013 with 3-year forecast", 
       x = "Year", y = "") + 
  geom_vline(xintercept = 2011, lty = 3, lwd = 1) + 
  coord_cartesian(xlim = c(2008, 2013), 
                  ylim = c(225, 425))

It is apparent in the close-up plot that the seasonal naive model performs poorly (and much worse than the damped Holt-Winters) on the training set (before 2011). However, on the test set (after 2011), we can see that the damped Holt-Winters model generally under-predicts actuals and performs worse than the seasonal naive model.

Exercise 7.9: Australian retail data, part 2

In this exercise, we try a different approach on the Australian retail data from Exercise 7.8. The approach outlined in the question includes the following steps:

Apply Box-Cox transformation to the training data
Decompose the time series using STL
Compute the seasonally-adjusted data
Fit an ETS model on the seasonally-adjusted data
Add seasonal adjustment and back-transform using the inverse Box-Cox
Forecast on the test data.

We can use the stlf function to accomplish these steps, by setting the arguments to:

h = 36 (test data from 2011/01 through 2013/12)
s.window = 13 for seasonal estimation window
t.window = default value for trend estimation window
lambda = “auto” for Box-Cox
method = “ets”
robust = TRUE or FALSE for robust loess fitting (we’ll try both)

# training set thru 2010/12, test set thru 2013/12
# stl fit on training set
stl.fc <- stlf(myts.train, 
               h = 36, 
               s.window = 13, 
               lambda = "auto", 
               method = "ets", 
               robust = FALSE)
# model summary
stl.fc$model

## ETS(A,N,N) 
## 
## Call:
##  ets(y = x, model = etsmodel, allow.multiplicative.trend = allow.multiplicative.trend) 
## 
##   Smoothing parameters:
##     alpha = 0.7218 
## 
##   Initial states:
##     l = 68.0462 
## 
##   sigma:  11.1992
## 
##      AIC     AICc      BIC 
## 3686.948 3687.019 3698.479

# try with robust fitting
stl2.fc <- stlf(myts.train, 
               h = 36, 
               s.window = 13, 
               lambda = "auto", 
               method = "ets", 
               robust = TRUE)

Now we compare accuracy measures for this model (both with and without robust fitting) to the models considered in Exercise 7.8. It appears that the seasonal naive model again performs the best on the test set. Of the two STL with ETS models, the version without robust fitting performs better on both the training and test sets.

# rmse
stl.rmse <- accuracy(stl.fc, myts)[ , 2]
stl2.rmse <- accuracy(stl2.fc, myts)[ , 2]

# comparison
cbind("Damped HW" = hw.rmse, 
      "Seas. naive" = snaive.rmse, 
      "STL/ETS" = stl.rmse, 
      "STL/ETS/robust" = stl2.rmse) %>%
  kable(caption = "RMSE comparison")

RMSE comparison
	Damped HW	Seas. naive	STL/ETS	STL/ETS/robust
Training set	13.14634	26.30758	10.76922	12.03861
Test set	23.10869	20.91121	23.97219	27.87913

we can visualize the forecasts below, comparing the STL with ETS models (with and without robust fitting) to the seasonal naive model. It appears the seasonal naive model is more responsive to changes in the observations, although it has a tendency to over-shoot on the downside.

# close-up plot
autoplot(myts, lwd = 1) + 
  autolayer(snaive.fc, series = "Seas-Naive", PI = FALSE) + 
  autolayer(stl.fc, series = "STL/ETS", PI = FALSE) + 
  autolayer(stl2.fc, series = "STL/ETS/robust", PI = FALSE) + 
#  autolayer(fitted(snaive.fc), series = "SeasNaive") + 
#  autolayer(fitted(stl.fc), series = "STLETS") + 
#  autolayer(fitted(stl2.fc), series = "STLETSrob") + 
  labs(title = "Turnover - NSW - Hardware, building & garden supplies retailing", 
       subtitle = "Close-up: 2008-2013 with 3-year forecast", 
       x = "Year", y = "") + 
  geom_vline(xintercept = 2011, lty = 3, lwd = 1) + 
  coord_cartesian(xlim = c(2008, 2013), 
                  ylim = c(225, 425))

DATA 624 Homework #5

Kevin Benson

3/8/2020

Exercise 7.1: `pigs` time series

a. Parameters and forecast

b. Prediction interval

Exercise 7.5: `books` dataset

a. Time series plots

b. 4-day forecasts

c. Training RMSE

Exercise 7.6: `books` dataset

a. Holt’s linear method

b. Training RMSE

c. Comparison of forecasts

d. Prediction interval

Exercise 7.7: `eggs` time series

Exercise 7.8: Australian retail data

a. Seasonality

b. Holt-Winters

c. Training RMSE

d. Residuals

e. Test RMSE

Exercise 7.9: Australian retail data, part 2

DATA 624 Homework #5

Kevin Benson

3/8/2020

Exercise 7.1: pigs time series

a. Parameters and forecast

b. Prediction interval

Exercise 7.5: books dataset

a. Time series plots

b. 4-day forecasts

c. Training RMSE

Exercise 7.6: books dataset

a. Holt’s linear method

b. Training RMSE

c. Comparison of forecasts

d. Prediction interval

Exercise 7.7: eggs time series

Exercise 7.8: Australian retail data

a. Seasonality

b. Holt-Winters

c. Training RMSE

d. Residuals

e. Test RMSE

Exercise 7.9: Australian retail data, part 2

Exercise 7.1: `pigs` time series

Exercise 7.5: `books` dataset

Exercise 7.6: `books` dataset

Exercise 7.7: `eggs` time series