7.1 Consider the pigs series — the number of pigs slaughtered in Victoria each month.

a) Use the ses() function in R to find the optimal values of \(\alpha\) and \(\ell_0\), and generate forecasts for the next four months.

First, we will look at the series:

# Plot series
autoplot(pigs) +
  scale_y_continuous(labels=comma) +
  labs(title="Monthly Pig Slaughter", subtitle = "Victoria, Australia",
       y="# Animals", x="Year")

The plot above shows that there may not be a clear trend, but there does appear to be some possible seasonality. So, the SES method might not be appropriate here but that is what the question has asked us to do so we will proceed.

# Use simple exponential smoothing
pigs.ses <- ses(pigs,h=4)

# Get the optimized parameters
pigs.ses$model
## Simple exponential smoothing 
## 
## Call:
##  ses(y = pigs, h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.2971 
## 
##   Initial states:
##     l = 77260.0561 
## 
##   sigma:  10308.58
## 
##      AIC     AICc      BIC 
## 4462.955 4463.086 4472.665

The output from the ses() function shows us an \(\alpha\) value of 0.2971 and an initial state (\(\ell_0\)) of 77260.0561.

# Plot series and smoothing
autoplot(pigs, series="Original") +
  autolayer(pigs.ses$fitted, series="Smoothed") +
  autolayer(pigs.ses, series="Smoothed") +
  scale_color_manual(name="",
                     values=c("Original"="grey50","Smoothed"="darkred"),
                     breaks=c("Original","Smoothed")) +
  scale_y_continuous(labels=comma) +
  labs(title="Monthly Pig Slaughter", subtitle = "Victoria, Australia",
       y="# Animals", x="Year")

b) Compute a 95% prediction interval for the first forecast using \(\hat{y}\pm1.96s\) where \(s\) is the standard deviation of the residuals. Compare your interval with the interval produced by R.
# Compute 95% confidence interval for first predicted value
int <- sd(pigs.ses$residuals) * 1.96
low <- pigs.ses$mean[1] - int
high <- pigs.ses$mean[1] + int

list(Low = low, High = high)
## $Low
## [1] 78679.97
## 
## $High
## [1] 118952.8

We can then compare to the built-in prediction confidence intervals:

list(Low = pigs.ses[["lower"]][1,2], High = pigs.ses[["upper"]][1,2])
## $Low
##      95% 
## 78611.97 
## 
## $High
##      95% 
## 119020.8

The manual interval is 40,272.88 wide, while the interval predicted from the R function is a slightly-less optimistic 40,408.88 wide.


7.5 Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.

a) Plot the series and discuss the main features of the data.
autoplot(books,facets=T) +
  labs(title="Daily Book Sales", y="# Books Sold", x="Day")

For both types of books (Paperback and Hardcover) there appears to be an upward trend over time. Neither one appears to have any sort of seasonality pattern, though there is quite a bit of variance in sales day over day.

b) Use the ses() function to forecast each series, and plot the forecasts.
# Run ses on each series
paperback <- ses(books[,"Paperback"],h=4)
hardcover <- ses(books[,"Hardcover"], h=4)

autoplot(books[,"Paperback"], series = "Original Data") +
  autolayer(paperback$fitted, series="Smoothed (Simple)") +
  autolayer(paperback, series="Smoothed (Simple)") +
  scale_color_manual(name="",
                     values=c("Original Data"="grey25",
                              "Smoothed (Simple)"="darkred"),
                     breaks=c("Original Data","Smoothed (Simple)")) +
  labs(title="Daily Sales of Paperback Books", y="# Books", x="Day")

autoplot(books[,"Hardcover"], series = "Original Data") +
  autolayer(hardcover$fitted, series="Smoothed (Simple)") +
  autolayer(hardcover, series="Smoothed (Simple)") +
  scale_color_manual(name="",
                     values=c("Original Data"="grey25",
                              "Smoothed (Simple)"="darkred"),
                     breaks=c("Original Data","Smoothed (Simple)")) +
  labs(title="Daily Sales of Hardcover Books", y="# Books", x="Day")

c) Compute the RMSE values for the training data in each case.

The RMSE for the Paperback training data is 33.64, and the RMSE for the Hardcover training data is 31.93.


7.6 We will continue with the daily sales of paperback and hardcover books in data set books.

a) Apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.
paperback.holt <- holt(books[,"Paperback"],h=4)
hardcover.holt <- holt(books[,"Hardcover"],h=4)

autoplot(books[,"Paperback"], series = "Original Data") +
  autolayer(paperback.holt$fitted, series="Smoothed (Holt)") +
  autolayer(paperback.holt, series="Smoothed (Holt)") +
  scale_color_manual(name="",
                     values=c("Original Data"="grey25",
                              "Smoothed (Holt)"="darkred"),
                     breaks=c("Original Data","Smoothed (Holt)")) +
  labs(title="Daily Sales of Paperback Books", y="# Books", x="Day")

autoplot(books[,"Hardcover"], series = "Original Data") +
  autolayer(hardcover.holt$fitted, series="Smoothed (Holt)") +
  autolayer(hardcover.holt, series="Smoothed (Holt)") +
  scale_color_manual(name="",
                     values=c("Original Data"="grey25",
                              "Smoothed (Holt)"="darkred"),
                     breaks=c("Original Data","Smoothed (Holt)")) +
  labs(title="Daily Sales of Hardcover Books", y="# Books", x="Day")

b) Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.

Using Holt’s method, the RMSE for the Paperback sales is 31.14 while the RMSE for Hardcover is 27.19.

These values are a bit better than the SES method above, indicating that Holt’s method has a slightly better fit than SES.

We canalso look at the parameters chosen for each method:

p1 <- rbind(tibble(method = "SES", type = "Paperback", 
       alpha = paperback$model$par[1], beta = as.double(NA)),
      tibble(method = "SES", type = "Hardcover", 
       alpha = hardcover$model$par[1], beta = as.double(NA)))

p2 <- rbind(tibble(method = "Holt", type = "Paperback", 
       alpha = paperback.holt$model$par[1],
       beta = paperback.holt$model$par[2]),
      tibble(method = "Holt", type = "Hardcover", 
       alpha = hardcover.holt$model$par[1],
       beta = hardcover.holt$model$par[2]))

kable(rbind(p1,p2)) %>%
  kable_styling(bootstrap_options = "striped", full_width = F)
method type alpha beta
SES Paperback 0.1685372 NA
SES Hardcover 0.3282665 NA
Holt Paperback 0.0001000 1e-04
Holt Hardcover 0.0001000 1e-04

The SES model chose \(\alpha\) values that weighted older values more than the Holt models did; more so for Hardcover books.

Holt’s method chose extremely small \(\alpha\) values as well as similarly small \(\beta^*\) values.

In fact, the Holt method isn’t that far from a simple regression line:

linear <- rowid_to_column(as.data.frame(books)) %>%
  {lm(Paperback ~ rowid, data=.)}

linear.pred <- predict(linear,newdata = tibble(rowid = seq(1:34)))

autoplot(books[,"Paperback"], series = "Original Data") +
  autolayer(paperback.holt$fitted, series="Smoothed (Holt)") +
  autolayer(paperback.holt, series="Smoothed (Holt)") +
  autolayer(ts(linear.pred), series="Linear") +
  scale_color_manual(name="",
                     values=c("Original Data"="grey25",
                              "Smoothed (Holt)"="darkred",
                              "Linear"="darkgreen"),
                     breaks=c("Original Data","Smoothed (Holt)",
                              "Linear")) +
  labs(title="Daily Sales of Paperback Books", y="# Books", x="Day")

c) Compare the forecasts for the two series using both methods. Which do you think is best?

Comparing the two methods, I would prefer Holt’s method to SES. Not only does Holt’s produce a smaller RMSE (albeit only slightly), but the prediction continues the apparent upward trend, whereas the SES predicts a flatter value. Neither one, to me, appears all that good though, as they have very large prediction intervals.

d) Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.

Using the formula \(\hat{y}_{T+h|T}=1.96\hat{\sigma_h}\), we compute the following intervals for the first prediction:

# Get Std. Deviations
sigmas <- list("SES Paperback" = sd(paperback$residuals))
sigmas[["SES Hardcover"]] <- sd(hardcover$residuals)
sigmas[["Holt Paperback"]] <- sd(paperback.holt$residuals)
sigmas[["Holt Hardcover"]] <- sd(hardcover.holt$residuals)

PIs <- tibble(Method = "SES Paperback",
              Low = paperback$mean[[1]] - (1.96 * sigmas[["SES Paperback"]]),
              High = paperback$mean[[1]] + (1.96 * sigmas[["SES Paperback"]]))

PIs <- rbind(PIs,
  tibble(Method = "SES Hardcover",
              Low = hardcover$mean[[1]] - (1.96 * sigmas[["SES Hardcover"]]),
              High = hardcover$mean[[1]] + (1.96 * sigmas[["SES Hardcover"]]))
  )

PIs <- rbind(PIs,
  tibble(Method = "Holt Paperback",
              Low = paperback.holt$mean[[1]] - (1.96 * sigmas[["Holt Paperback"]]),
              High = paperback.holt$mean[[1]] + (1.96 * sigmas[["Holt Paperback"]]))
  )

PIs <- rbind(PIs,
  tibble(Method = "Holt Hardcover",
              Low = hardcover.holt$mean[[1]] - (1.96 * sigmas[["Holt Hardcover"]]),
              High = hardcover.holt$mean[[1]] + (1.96 * sigmas[["Holt Hardcover"]]))
  )

PIs$Width <- PIs$High - PIs$Low

kable(PIs) %>% kable_styling(bootstrap_options = "striped", full_width = F)
Method Low High Width
SES Paperback 141.5964 272.6230 131.0266
SES Hardcover 178.5848 300.5354 121.9505
Holt Paperback 147.8390 271.0945 123.2555
Holt Hardcover 195.9640 304.3838 108.4198

We can compare to the built-in prediction intervals from the R functions:

PIs.auto <- tibble(Method = "SES Paperback",
              Low = paperback$lower[[1,2]],
              High = paperback$upper[[1,2]])

PIs.auto <- rbind(PIs.auto,
                  tibble(Method = "SES Hardcover",
                         Low = hardcover$lower[[1,2]],
                         High = hardcover$upper[[1,2]])
                  )
PIs.auto <- rbind(PIs.auto,
                  tibble(Method = "Holt Paperback",
                         Low = paperback.holt$lower[[1,2]],
                         High = paperback.holt$upper[[1,2]])
                  )
PIs.auto <- rbind(PIs.auto,
                  tibble(Method = "Holt Hardcover",
                         Low = hardcover.holt$lower[[1,2]],
                         High = hardcover.holt$upper[[1,2]])
                  )
PIs.auto$Width <- PIs.auto$High - PIs.auto$Low

kable(PIs.auto) %>% kable_styling(bootstrap_options = "striped", full_width = F)
Method Low High Width
SES Paperback 138.8670 275.3523 136.4853
SES Hardcover 174.7799 304.3403 129.5604
Holt Paperback 143.9130 275.0205 131.1076
Holt Hardcover 192.9222 307.4256 114.5034

The prediction intervals for the first prediction are wider when generated by the built-in functions.


7.7 For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900–1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.

[Hint: use h=100 when calling holt() so you can clearly see the differences between the various options when plotting the forecasts.]

Which model gives the best RMSE?

First, we’ll plot the data:

autoplot(eggs) +
  scale_y_continuous(labels=dollar) +
  labs(title="U.S. Egg Prices", subtitle="(Per Dozen)",
       x="Year", y="Price")

Immediately, I question the data. I most certainly was not paying $62.27 for a dozen eggs in 1993! I can only assume this is for some larger quantity (like a case of a dozen eggs or something). Nevertheless…we will press on.

Next we’ll look at Holt’s method and see how it forecasts:

eggs.holt <- holt(eggs,h=100)

autoplot(eggs.holt) +
  scale_y_continuous(labels=dollar) +
  labs(title="U.S. Egg Prices", subtitle="(Per Dozen) - Holt",
       x="Year", y="Price")

The linear Holt method shows a forecast which is a continuation of the downward trend. In fact, the forecast begins to dip considerably below zero, though we are using \(h=100\) to illustrate.

Next we’ll try with the dampened trend:

eggs.holtDamped <- holt(eggs,h=100, damped = T)

autoplot(eggs.holtDamped) +
  scale_y_continuous(labels=dollar) +
  labs(title="U.S. Egg Prices",
       subtitle="(Per Dozen) - Holt with dampening",
       x="Year", y="Price")

The dampened version nearly immediately cancels out the trend and produces a flat prediction. This is because of the low \(\phi\) parameter the function estimated (0.8).

Next we will try using a Box-Cox transform to see if the data behave better in the smoothing functions once they have been transformed.

eggs.holt.Box <- holt(eggs,h=100, lambda="auto", biasadj = T)

autoplot(eggs.holt.Box) +
  scale_y_continuous(labels=dollar) +
  labs(title="U.S. Egg Prices",
       subtitle="(Per Dozen) - Holt after Box-Cox",
       caption = "*Bias-Adjusted for mean forecasts",
       x="Year", y="Price")

The Box-Cox adjusted data seem to behave better with the linear Holt method now, with the forecast acting a bit like a damped model.

eggs.holtDamped.Box <- holt(eggs,h=100, damped=T, lambda="auto", biasadj = T)

autoplot(eggs.holtDamped.Box) +
  scale_y_continuous(labels=dollar) +
  labs(title="U.S. Egg Prices",
       subtitle="(Per Dozen) - Holt Damped after Box-Cox",
       caption = "*Bias-Adjusted for mean forecasts",
       x="Year", y="Price")

Interestingly, the damped version of the Holt smoothing predicts a slightly upward forecast when the data have been Box-Cox transformed.

This is somewhat odd because when you look at the Box-Cox transformed data side-by-side with the original data, the transformation seems to have done little to reduce variability:

cbind(original = eggs,transformed = BoxCox(eggs,lambda="auto")) %>%
  autoplot(facets=T) +
  scale_y_continuous(labels=dollar) +
  labs(title="U.S. Egg Prices",
       subtitle="(Per Dozen) - Original Data vs. Transformed",
       x="Year", y="Price")

Comparing the RMSE values of these models:

RMSE <- tibble(Model = "Holt", RMSE = sqrt(eggs.holt$model$mse))
RMSE <- rbind(RMSE, tibble(Model = "Holt Damped",
                           RMSE = sqrt(eggs.holtDamped$model$mse)))
RMSE <- rbind(RMSE, tibble(Model = "Holt Transformed",
                           RMSE = sqrt(eggs.holt.Box$model$mse)))
RMSE <- rbind(RMSE, tibble(Model = "Holt Damped Transformed",
                           RMSE = sqrt(eggs.holtDamped.Box$model$mse)))

kable(RMSE) %>% kable_styling(bootstrap_options = "striped", full_width = F)
Model RMSE
Holt 26.582188
Holt Damped 26.540185
Holt Transformed 1.032217
Holt Damped Transformed 1.039187

Because the transformed models are on a different scale, we cannot directly compare RMSEs. However, the accuracy() function can help:

RMSE <- tibble(Model = "Holt", RMSE = accuracy(eggs.holt)[,"RMSE"])
RMSE <- rbind(RMSE, tibble(Model = "Holt Damped",
                           RMSE = accuracy(eggs.holtDamped)[,"RMSE"]))
RMSE <- rbind(RMSE, tibble(Model = "Holt Transformed",
                           RMSE = accuracy(eggs.holt.Box)[,"RMSE"]))
RMSE <- rbind(RMSE, tibble(Model = "Holt Damped Transformed",
                           RMSE = accuracy(eggs.holtDamped.Box)[,"RMSE"]))

kable(RMSE) %>% kable_styling(bootstrap_options = "striped", full_width = F)
Model RMSE
Holt 26.58219
Holt Damped 26.54019
Holt Transformed 26.38689
Holt Damped Transformed 26.58589

We can see that the models’ RMSE values are all very close, which makes sense given that the Box-Cox transform didn’t seem to help the variance much. Strictly speaking, the Holt method with the transform was the lowest RMSE, but only by the smallest of margins.


7.8 Recall your retail time series data (from Exercise 3 in Section 2.10).

# A3349414R series ("Turnover ;  Victoria ;  Liquor retailing ;")
retail <- readxl::read_xlsx("retail.xlsx",skip=1)
retail <- ts(retail[,"A3349414R"], frequency = 12, start = c(1982,4))
a) Why is multiplicative seasonality necessary for this series?

Multiplicative seasonality is necessary for this series because the seasonality is not constant throughout the series. The below x11 decomposition shows this nicely:

retail.decomp <- seas(retail, x11="")
autoplot(retail.decomp)

b) Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.
retail.hw <- hw(retail,seasonal="multiplicative",h=48)

retail.hw %>% autoplot() +
  labs(title="Liquor Retailing Turnover",subtitle="Victoria (Holt-Winters)",
  x="Date",y="Turnover")

The Holt-Winters method seems to present a nice forecast which appears to capture the seasonality well. Here we used \(h=48\) to show how the trend continues.

retail.hw.damp <- hw(retail,seasonal="multiplicative",damped=T,h=48)

retail.hw.damp %>% autoplot() +
  labs(title="Liquor Retailing Turnover",
       subtitle="Victoria (Holt-Winters Damped)",
  x="Date",y="Turnover")

The model with dampening flattens somewhat as the prediction continues.

c) Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?
RMSE <- tibble(Model = "Holt-Winters", RMSE = accuracy(retail.hw)[,"RMSE"])
RMSE <- rbind(RMSE, tibble(Model = "Holt-Winters Damped",
                           RMSE = accuracy(retail.hw.damp)[,"RMSE"]))

kable(RMSE) %>% kable_styling(bootstrap_options = "striped", full_width = F)
Model RMSE
Holt-Winters 4.237051
Holt-Winters Damped 4.147425

The RMSE indicates that the damped version of the Holt-Winters method is slightly better than the non-damped version.

d) Check that the residuals from the best method look like white noise.
autoplot(retail.hw.damp$residuals) +
  labs(title="Residuals for Holt-Winter Method with Damping",
       x="Time", y="Value")

Looking at the residuals, there appears to be some hints of a pattern.

e) Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 8 in Section 3.7?
# Split dataset
retail.train <- window(retail,end=c(2010,12))
retail.test <- window(retail,start=c(2011,1))

# Train Holt-Winter damped model
retail.model <- hw(retail.train,seasonal="multiplicative",damped=T)

# Check accuracy
accuracy(retail.model,retail.test) %>% kable() %>%
  kable_styling(bootstrap_options = "striped")
ME RMSE MAE MPE MAPE MASE ACF1 Theil’s U
Training set 0.4223714 3.717405 2.456063 0.3612467 4.543397 0.4221041 -0.0015053 NA
Test set 7.5260708 10.714507 8.794868 4.2972191 5.207060 1.5115045 0.4799290 0.3396356

Let’s compare to the seasonal naive forecast:

# Train a seasonal naive model
retail.snaive <- snaive(retail.train)

accuracy(retail.snaive,retail.test) %>% kable() %>%
  kable_styling(bootstrap_options = "striped")
ME RMSE MAE MPE MAPE MASE ACF1 Theil’s U
Training set 4.455255 8.699864 5.818619 6.154001 9.948117 1.000000 0.7261600 NA
Test set 19.170833 22.956217 19.520833 11.590392 11.813322 3.354891 0.5801161 0.7479721

Comparing RMSE, the damped Holt-Winter method beats the seasonal naive forecast quite handily.


7.9 For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

# Get the lambda value
l <- BoxCox.lambda(retail)

# Transform
retail.trans <- BoxCox(retail,lambda=l)

# Use STL decomposition
retail.stl <- mstl(retail.trans)

# Plot transformed data
autoplot(retail.trans, series="Transformed Data") +
  autolayer(seasadj(retail.stl), series="Seasonally-Adjusted") +
  scale_color_manual(name="",
                     values=c("Transformed Data"="grey25",
                              "Seasonally-Adjusted"="red3")) +
  labs(title="Liquor Retailing Turnover",
       subtitle=substitute("Victoria (Box-Cox Transformed [" *
                       lambda * "= " * l * "])", list(l = round(l,3))),
       x="Date",y="Turnover")

# Plot back-transformed data
autoplot(retail, series="Original Data") +
  autolayer(InvBoxCox(seasadj(retail.stl),lambda=l),
            series="Seasonally-Adjusted") +
  scale_color_manual(name="",
                     values=c("Original Data"="grey25",
                              "Seasonally-Adjusted"="red3")) +
  labs(title="Liquor Retailing Turnover",
       subtitle="Victoria (Backtransformed)",
       x="Date",y="Turnover")

The STL procedure looks to have done a good job of removing the seasonality of the data.

# Split the data
retail.train <- window(seasadj(retail.stl),end=c(2010,12))
retail.test <- window(seasadj(retail.stl),start=c(2011,1))

# Fit the ETS model
retail.ets <- ets(retail.train)

retail.ets$method
## [1] "ETS(A,A,N)"

The ETS function chose an “A,A,N” method, which seems appropriate.

Now we will forecast out and compare to the seasonally-adjusted data:

retail.forecast <- forecast(retail.ets, h=36, biasadj = T, lambda=l)

autoplot(retail.forecast) +
  autolayer(InvBoxCox(seasadj(retail.stl),lambda=l),
            series="Seasonally-Adjusted") +
  scale_color_manual(name="",
                     values=c("Seasonally-Adjusted"="red3")) +
  labs(title="Liquor Retailing Turnover",
       subtitle="Backtransformed",
       x="Date",y="Turnover")

Now let’s look at the RMSE and compare to the previous models:

accuracy(retail.forecast,retail.test) %>% kable() %>%
  kable_styling(bootstrap_options = "striped")
ME RMSE MAE MPE MAPE MASE ACF1 Theil’s U
Training set 0.0003719 0.044218 0.0325828 -0.0128085 0.9545306 0.3640067 -0.0085634 NA
Test set -165.1163481 165.509643 165.1163481 -3601.9936425 3601.9936425 1844.6356244 0.9165682 4586.616

The RMSE for this method is the worst of them all. This is likely because we used the seasonally-adjusted data to predict, and the RMSE compares to values with the seasonality included.

If we look at the RMSE compared to the seasonally-adjusted data, this method compares more favorably:

accuracy(retail.forecast,InvBoxCox(seasadj(retail.stl),lambda=l)) %>%
  kable() %>%
  kable_styling(bootstrap_options = "striped")
ME RMSE MAE MPE MAPE MASE ACF1 Theil’s U
Training set 0.0003719 0.044218 0.0325828 -0.0128085 0.9545306 0.3640067 -0.0085634 NA
Test set -7.8998431 13.953127 10.9850179 -4.9897988 6.8368559 122.7216778 0.7717261 1.962175