Homework 5

Questions

Question 7.1

Consider the pigs series - the number of pigs slaughtered in Victoria each month.

Use the ses() function in R to find the optimal values of \(\alpha\) and \(\ell_0\), and generate forecasts for the next four months.

data(pigs)
ses.pigs <- pigs %>%
  ses(h=4)
ses.pigs %>%
  autoplot()

The plot above shows the pigs time series with the 4 forecast values. As expected, the values produced by ses are all identical. This is by design. Note the expanding confidence intervals as predictions move forward. More advanced methods of prediction utilize previous predicted values to calculate further future values.

ses.pigs %>%
  summary()

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = ., h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.2971 
## 
##   Initial states:
##     l = 77260.0561 
## 
##   sigma:  10308.58
## 
##      AIC     AICc      BIC 
## 4462.955 4463.086 4472.665 
## 
## Error measures:
##                    ME    RMSE      MAE       MPE     MAPE      MASE
## Training set 385.8721 10253.6 7961.383 -0.922652 9.274016 0.7966249
##                    ACF1
## Training set 0.01282239
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Sep 1995       98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995       98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995       98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995       98816.41 83958.37 113674.4 76092.99 121539.8

ses has fit the pig series with an \(\alpha=0.2971\) and an \(\ell_0=77260.0561\).

The point forecast for the next 4 months are all \(98816.41\). All future values predicted by ses will be the same.

Compute a 95% prediction interval for the first forecast using \(\hat{y}\pm 1.96s\) where \(s\) is the standard deviation of the residuals. Compare your interval with the interval produced by R.

ses.pigs %>%
  residuals() %>%
  sd()

## [1] 10273.69

The standard deviation of the residuals is \(10273.69\). This makes the bounds for the interval for the first forecast: \(98816.41-1.96\times 10273.69=78679.98\) and \(98816.41+1.96\times 10273.69=118952.8\) or \((78679.98, 118952.8)\).

Compared to the calculated interval of \((78611.97, 119020.8)\) we see that the intervals are quite similar, although the interval calculated by R is slightly larger. This is likely due to the fact that \(1.96\) is not the exact value necessary to find the 95% confidence interval but a convenient rounding.

Question 7.5

Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.

Plot the series and discuss the main features of the data.

data(books)
books %>%
  autoplot()

The books data set consists of two time series over the same period of time. The first series is for paperback books while the second is for hardcover. Both series appear to have an upwards trend indicating increased sales for both paperback and hardcover over time. There is significant day-to-day noise in the data set however. Just from visual appoximation, the data does not appear to be weekly seasonal, although it is possible there is a subtle trend in the data that will be revealed when the information is decomposed. There is also not much data to work with, which may make it difficult to make accurate predictions.

Use the ses() function to forecast each series, and plot the forecasts.

ses.paperback <- books[, 1] %>%
  ses() 
ses.paperback %>%
  autoplot() +
  autolayer(fitted(ses.paperback), series='Fitted')

ses.hardcover <- books[, 2] %>%
  ses() 

ses.hardcover %>%
  autoplot() +
  autolayer(fitted(ses.hardcover), series='Fitted')

ses appears to have done a poor job of fitting the data. The fitted values do not appear to have identified the trend in the data and the forecast’s confidence interval encompasses much of the range of the original data.

Compute the RMSE values for the training data in each case.

ses.paperback %>%
  accuracy()

##                    ME     RMSE     MAE       MPE     MAPE      MASE
## Training set 7.175981 33.63769 27.8431 0.4736071 15.57784 0.7021303
##                    ACF1
## Training set -0.2117522

ses.hardcover %>%
  accuracy()

##                    ME     RMSE      MAE      MPE     MAPE      MASE
## Training set 9.166735 31.93101 26.77319 2.636189 13.39487 0.7987887
##                    ACF1
## Training set -0.1417763

The RMSE for paperback is \(33.63769\) and the RMSE for hardcover is \(31.93101\). These values likely represent a lower baseline on achieveable predictive ability as the ses function works best if the data has no trend or seasonal component. Initial exploration of the data indicates that this is likely not the case.

Question 7.6

Now apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.

books[, 1] %>%
  holt(h=4)

##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       209.4668 166.6035 252.3301 143.9130 275.0205
## 32       210.7177 167.8544 253.5811 145.1640 276.2715
## 33       211.9687 169.1054 254.8320 146.4149 277.5225
## 34       213.2197 170.3564 256.0830 147.6659 278.7735

books[, 2] %>%
  holt(h=4)

##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       250.1739 212.7390 287.6087 192.9222 307.4256
## 32       253.4765 216.0416 290.9113 196.2248 310.7282
## 33       256.7791 219.3442 294.2140 199.5274 314.0308
## 34       260.0817 222.6468 297.5166 202.8300 317.3334

As noted in the last question, this forecast is trending upwards (as opposed to flat).

Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.

books[, 1] %>%
  holt() %>%
  accuracy()

##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -3.717178 31.13692 26.18083 -5.508526 15.58354 0.6602122
##                    ACF1
## Training set -0.1750792

books[, 2] %>%
  holt() %>%
  accuracy()

##                      ME     RMSE      MAE       MPE    MAPE      MASE
## Training set -0.1357882 27.19358 23.15557 -2.114792 12.1626 0.6908555
##                     ACF1
## Training set -0.03245186

The RMSE for each set of data is smaller than the corresponding RMSE when using the ses function. This indicates that the holt function provides a better fit for the data than the ses funciton. However, this does not necessarily indicate that the holt function is superior. As noted in the questions, the holt function utilizes an additional parameter not present in ses. Thus it is unclear whether a strong fit was found or if the function is simply overfitting the training data.

In order to make such a prediction, it would be great to have held back some data for validation. We could determine which function provided better predictions (via RMSE) on the validation set. I would also be less worried about overfitting if there was more (or less noisy) data.

Compare the forecasts for the two series using both methods. Which do you think is best?

books[, 1] %>%
  holt() %>%
  autoplot()

books[, 2] %>%
  holt() %>%
  autoplot()

Comparing the holt forcast with the ses forcast plotted in the previous problem, it seems likely that the holt forcast will prove to be more accurate. As noted in the previous question, ses functions best when there is no trend or seasonality. Although there is not much data, it appears that both data sets have an upwards trend. This is better captured by the forecast in the holt function. However, it is unlikely that the upward trend will continue indefinitely and as such, we should entertain the idea of damping the predictions.

Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.

books[, 1] %>%
  ses(h=1)

##    Point Forecast    Lo 80    Hi 80   Lo 95    Hi 95
## 31       207.1097 162.4882 251.7311 138.867 275.3523

books[, 1] %>%
  holt(h=1)

##    Point Forecast    Lo 80    Hi 80   Lo 95    Hi 95
## 31       209.4668 166.6035 252.3301 143.913 275.0205

forecast <- books[, 1] %>%
  forecast(h=1)

bounds <- Metrics::rmse(books[, 1], forecast$fitted)*1.96

forecast$mean[1] - bounds

## [1] 141.1706

forecast$mean[1]

## [1] 207.1005

forecast$mean[1] + bounds

## [1] 273.0304

For the paperback data set the confidence interval across all three methods is comprable. My calculated intervel is slightly smaller than the produced ones.

books[, 2] %>%
  ses(h=1)

##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       239.5601 197.2026 281.9176 174.7799 304.3403

books[, 2] %>%
  holt(h=1)

##    Point Forecast   Lo 80    Hi 80    Lo 95    Hi 95
## 31       250.1739 212.739 287.6087 192.9222 307.4256

forecast <- books[,2] %>%
  forecast(h=1)

bounds <- Metrics::rmse(books[, 2], forecast$fitted)*1.96

forecast$mean[1] - bounds

## [1] 196.8588

forecast$mean[1]

## [1] 250.2017

forecast$mean[1] + bounds

## [1] 303.5447

My predictions this time around stick closer to the holt prediction than the ses prediction. As mentioned above ses appears to have difficulty capturing the increase in trend in paperback and hardcover books.

In both cases, the holt confidence interval consists of a smaller range of values, indicating more confidence in the predictions than with the ses function. This makes sense as holt is able to utilize the trend information in a way that ses cannot.

Question 7.7

For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900-1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.

[Hint: use \(h=100\) when calling holt() so you can clearly see the differences between the various options when plotting the forecasts.]

Which model gives the best RMSE?

data(eggs)
autoplot(eggs) +
  autolayer(holt(eggs, h=100), series='Fitted')

 autoplot(eggs) +
  autolayer(holt(eggs, h=100, damped=TRUE), series='Damped')

autoplot(eggs) +
  autolayer(holt(eggs, h=100, lambda=BoxCox.lambda(eggs)), series='BoxCox')

The standard holt forecast continues the recent trend in prices forward for one hundred years. This forecast is problematic for a number of reasons, not the least of which is that it predicts negative egg prices starting around 2025. The damped forecast attempts to addres this issue. Damping causes the forecast to “level out” as time passes, suggesting that predictions become less reliable as predictions move further away from actual data and that changes should steady instead of keeping their recent trend. It is hard to see in the plot due to the length of predictions, but the forecast starts downwards similar to the standard holt before leveling out. Finally, the boxcox transformed model appears the strongest. It has correctly identified that since 1950 there has been a somewhat steady exponential decay in the price of eggs and continues that trend forward. It very nicely also prevent prices from dropping below 0 dollars.

I tried one more model that combined boxcox and damping and although I thought it provided the best of both models in it’s forecast, the corresponding confidence intervals appears way off base, suggesting that prices could likely rise to pre-1950 levels again.

Looking at the RMSE supports the idea that the Boxcox transformation is the strongest, although they are all very close.

eggs %>%
  holt() %>%
  accuracy()

##                      ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.04499087 26.58219 19.18491 -1.142201 9.653791 0.9463626
##                    ACF1
## Training set 0.01348202

eggs %>%
  holt(damped=TRUE) %>%
  accuracy()

##                     ME     RMSE     MAE       MPE     MAPE      MASE
## Training set -2.891496 26.54019 19.2795 -2.907633 10.01894 0.9510287
##                      ACF1
## Training set -0.003195358

eggs %>%
  holt(lambda=BoxCox.lambda(eggs)) %>%
  accuracy()

##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.7736844 26.39376 18.96387 -1.072416 9.620095 0.9354593
##                    ACF1
## Training set 0.03887152

The original data provides strong motivation for subsetting the data starting at about 1945. From that point forward there is a clear exponential decay in the data that may be easier to be modeled. The decade leading up to 1945 was during the great depression which likely played a large roll in the abnormal price of eggs. This is supported by the fact that pre depression egg prices generally fit the trend seen post 1945. It is likely that eliminating this abnormal data would result in a stronger model.

sub.eggs <- eggs %>%
  subset(start=46)
sub.eggs %>%
  autoplot() +
  autolayer(holt(sub.eggs, damped=TRUE, lambda=BoxCox.lambda(sub.eggs), h=100))

sub.eggs %>%
  holt(lambda=BoxCox.lambda(sub.eggs), damped=TRUE) %>%
  accuracy()

##                      ME     RMSE     MAE       MPE     MAPE      MASE
## Training set 0.03012025 17.23987 13.2045 -1.586251 9.435901 0.7759366
##                     ACF1
## Training set -0.09117129

In this case a combined damped and boxcox model produces what appears to be a logical, strongly predictive model. This is supported by the RMSE.

Question 7.8

Recall your retail time series data (from Exercise 3 in Section 2.10).

Why is multiplicative seasonality necessary for this series?

I reloaded the data and column that I selected from the retail file.

retaildata <- readxl::read_excel('./retail.xlsx', skip=1)
myts <- retaildata %>%
  select(A3349335T) %>%
  ts(frequency=12, start=c(1982, 4))
autoplot(myts)

Multiplicative seasonality is required because the change in value due to seasonality grows as time progresses. If the change in value from seasonality were constant (that is, ignoring trend) then additive seasonality would best model the data. It is clear from the above plot that separate from the trend, the swings in values are growing larger. This indicates multiplicative seasonality.

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

fit.1 <- myts %>%
  hw(seasonal='multiplicative', level=c(0), h=50)
fit.2 <- myts %>%
  hw(seasonal='multiplicative', level=c(0), damped = TRUE, h=50)
autoplot(myts) +
  autolayer(fit.1, series='Multiplicative') +
  autolayer(fit.2, series='Damped')

The amount of future predictions needs to be greatly extended in order for the difference between damped and not damped to become apparent. In this case, it is clear that the damped predictions are starting to even out while the multiplicative values continue to rise. It is likely that the damped values are more representative of the real future values as we can already see the trend of value starting to level out.

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

fit.1 %>%
  forecast(h=1) %>%
  accuracy()

##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set 0.9212824 25.20381 18.77683 0.06856226 1.979315 0.3016982
##                    ACF1
## Training set -0.1217931

fit.2 %>%
  forecast(h=1) %>%
  accuracy()

##                  ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 2.9059 25.10059 18.74334 0.2366077 1.993423 0.3011601
##                    ACF1
## Training set -0.1394101

The RMSE of the two models is near identical. Based on what I wrote above, I believed the damped model will do a better job of generalizing to the data as continue to look further into the future.

Check that the residuals from the best method look like white noise.

fit.2 %>%
  residuals() %>%
  autoplot()

There doesn’t appear to be any pattern to the residuals, however, there is a bit of heteroskedasticity with wider values earlier in the plot. As a proof of concept, applying a boxcox transformation appears to create random residuals and also what appears to me to the be the strongest model yet.

ts((myts**0.2-1)/0.2, frequency=12, start=c(1982, 4)) %>%
  hw(seasonal='additive', level=c(0), damped = TRUE, h=50) %>%
  residuals() %>%
  autoplot()

ts((myts**0.2-1)/0.2, frequency=12, start=c(1982, 4)) %>%
  hw(seasonal='additive', level=c(0), damped = TRUE, h=50) %>%
  autoplot()

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 8 in Section 3.7?

training <- myts %>%
  subset(end=345)
testing <- myts %>%
  subset(start=346)

fit.3 <- training %>%
  hw(seasonal='multiplicative', level=c(0), damped = TRUE, h=36)
Metrics::rmse(testing, fit.3$mean)

## [1] 39.60879

The RMSE for my selected model is \(39.60879\) This greatly beats all of the various naive approaches learned earlier. This is logical as the naive approach is just that. It does not consider most of the data making predictions.

naive <- training %>%
  naive(h=36, level = c(0))
Metrics::rmse(testing, naive$mean)

## [1] 238.3404

naive <- training %>%
  snaive(h=36, level = c(0))
Metrics::rmse(testing, naive$mean)

## [1] 145.4666

naive <- training %>%
  meanf(h=36, level = c(0))
Metrics::rmse(testing, naive$mean)

## [1] 1153.681

Question 7.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

lambda <- training %>% 
  BoxCox.lambda()

trans.ts <- training %>%
  transform(data = (training^lambda-1)/lambda) %>%
  select(data) %>%
  ts(frequency=12, start=c(1982, 4)) %>%
  mstl() 

trans.ts %>%
  autoplot()

ets.ts <- trans.ts %>%
  seasadj() %>%
  ets() 

ets.ts %>%
  forecast(h=36) %>%
  autoplot()

These plots prove promising. The decomposition appears to have found strong seasonality and isolated a linear trend (due to the BoxCox transformation and removed of seasonality). The predictions follow this adjusted data. In order to determine which model is better, we can compare the held out validation data.

trans.forecast <- trans.ts %>%
  seasadj() %>%
  ets() %>%
  forecast(h=36)
trans.predictions <- trans.forecast$mean %>%
  InvBoxCox(lambda=lambda)
Metrics::rmse(testing, trans.predictions)

## [1] 159.7686

autoplot(training) +
  autolayer(testing) + 
  autolayer(InvBoxCox(trans.forecast$mean, 0.2))

This model falls far behind the model created for the previous question. (It had an RMSE of 39.60879). As noted in that question, I believe it is the damping effect that produced the strong results as there is clear evidence that the data was leveling out. None of the other models (including this one) took that into consideration. In this model the seasadj also appears to make the model worse. It’s possible this is an example of overadjusting the data by applying too many “fixes”. We may have lost some of the signal along the way.