Week 6: CUNY 624 Homework Assignment 5

Loading the necessary libraries.

library(fpp2)
## Loading required package: ggplot2
## Loading required package: forecast
## Loading required package: fma
## Loading required package: expsmooth

7.1 Consider the pigs series - the number of pigs slaughtered in Victoria each month.

Loading in the pigs series.

data(pigs)
str(pigs)
##  Time-Series [1:188] from 1980 to 1996: 76378 71947 33873 96428 105084 ...
paste0("Frequency: ", frequency(pigs))
## [1] "Frequency: 12"
head(pigs)
##         Jan    Feb    Mar    Apr    May    Jun
## 1980  76378  71947  33873  96428 105084  95741

This is a monthly time series from 1980 to 1996. According to ?pigs, this is the monthly total number of pigs slaughtered in Victoria, Australia.

A. Use the ses() function in R to find the optimal values of \(\alpha\) and \(l_0\), and generate forecasts for the next four months.

Referring back to 7.1 of the Forecasting textbook, the simple exponential smoothing (SES) is a method suitable for forecasting data with no clear trend or seasonal pattern. Before we utilize the ses() function, let us determine to see if a trend or seasonality pattern exists.

autoplot(pigs) + ylab("Pigs Slaughtered")

ggseasonplot(pigs)

ggsubseriesplot(pigs)

gglagplot(pigs)

Acf(pigs, lag.max=150)

Interestingly, the plots above does not appear to have an obvious trend or seasonality component to it, but it does appear to have some cyclical component (as evidenced in the Acf plot). Now that being said, given that there is no seasonality and trend, we can try to utilize the simple exponential smoothing and see what we find.

fc <- ses(pigs)
summary(fc)
## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = pigs) 
## 
##   Smoothing parameters:
##     alpha = 0.2971 
## 
##   Initial states:
##     l = 77260.0561 
## 
##   sigma:  10308.58
## 
##      AIC     AICc      BIC 
## 4462.955 4463.086 4472.665 
## 
## Error measures:
##                    ME    RMSE      MAE       MPE     MAPE      MASE
## Training set 385.8721 10253.6 7961.383 -0.922652 9.274016 0.7966249
##                    ACF1
## Training set 0.01282239
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Sep 1995       98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995       98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995       98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995       98816.41 83958.37 113674.4 76092.99 121539.8
## Jan 1996       98816.41 83448.52 114184.3 75313.25 122319.6
## Feb 1996       98816.41 82955.06 114677.8 74558.56 123074.2
## Mar 1996       98816.41 82476.49 115156.3 73826.66 123806.2
## Apr 1996       98816.41 82011.54 115621.3 73115.58 124517.2
## May 1996       98816.41 81559.12 116073.7 72423.66 125209.2
## Jun 1996       98816.41 81118.26 116514.6 71749.42 125883.4

As noted in the “Optimization” section, this function will find the values of the unknown parameters (here, alpha and \(l_0\)), by minimizing the sum of squared errors.

\[SSE = \Sigma_{t=1}^{T}(y_t-\hat{y}_{t|t-1})^2 = \Sigma_{t-1}^{T}e^2_t\]

\(\alpha = 0.2971\) and \(\l_0 = 77260.0561\), where alpha is the smoothing parameter and \(l_0\) is the initial level.

B. Compute a 95% prediction interval for the first forecast using \(\hat{y} \pm 1.96s\) where \(s\) is the standard deviation of the residuals. Compare your interval with the interval produced by R.

From section 3.5, “Prediction Intervals”, a prediction interval gives an interval within which we expect \(y_t\) to lie within a specified probability. Generally, a prediction interval can be written as:

\[\hat{y}_{T+h|T} \pm c\hat{\sigma}_h\]

where multipler c depends on the coverage probability, usually 80% and 95% intervals.

For this particular questions, we need to find the standard deviation of the residuals. According to section 3.3, “Residuals”, the residuals in a time series model are what is left over after fitting a model. For many models, the residuals are equal to the difference between the observations and the corresponding fitted values.

\[e_t = y_t - \hat{y}_t\]

The residuals should be uncorrelated and have a zero mean. Any forecasting method that does not satisfy these properties implies that the model can be improved. It is also useful if the residuals have a constant variance and are normally distributed. The residuals can be found via the residuals() function.

res <- residuals(fc)
autoplot(res) + ggtitle("Residuals from SES method on Pigs Dataset")

gghistogram(res) + ggtitle("Histogram of residuals")

ggAcf(res) + ggtitle("ACF of residuals")

For the most part, the residuals appear the criteria above. Let us find the standard deviation for the residuals.

st_res <- sd(res)
paste0("Standard Deviation of Residuals: ", st_res)
## [1] "Standard Deviation of Residuals: 10273.693294987"

Now using the forecast() function as noted in section 7.7, let’s see if we can find the prediction point and prediction interval (95%).

forecast(fc, h=1)
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Sep 1995       98816.41 85605.43 112027.4 78611.97 119020.8

If we double check our math:

# 95% CI Upper
98816.41 + 1.96 * 10273.69
## [1] 118952.8
# 95% CI Lower
98816.41 - 1.96 * 10273.69
## [1] 78679.98

We can see that the numbers match up and that the forecast() function produces the same 95% confidence interval as by performing the function by hand.

7.5 Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.

Loading the books dataset.

data(books)
str(books)
##  Time-Series [1:30, 1:2] from 1 to 30: 199 172 111 209 161 119 195 195 131 183 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:2] "Paperback" "Hardcover"
head(books)
## Time Series:
## Start = 1 
## End = 6 
## Frequency = 1 
##   Paperback Hardcover
## 1       199       139
## 2       172       128
## 3       111       172
## 4       209       139
## 5       161       191
## 6       119       168

According to ?books, this is the daily sales of paperback and hardcover books at the same store.

A. Plot the series and discuss the main features of the data.

Let us visualize the series.

# I had left out the seasonal components as the data is daily.
autoplot(books) + ylab("Books sold per day")

Let’s look at the lag and ACF plots at the paperbacks and hardcovers separately.

Paperbacks:

books[,"Paperback"]
## Time Series:
## Start = 1 
## End = 30 
## Frequency = 1 
##  [1] 199 172 111 209 161 119 195 195 131 183 143 141 168 201 155 243 225
## [18] 167 237 202 186 176 232 195 190 182 222 217 188 247
autoplot(books[,"Paperback"]) + ggtitle("Plot of Paperback")

gglagplot(books[,"Paperback"])

Acf(books[,"Paperback"], lag.max=150)

There appears to be no obvious cyclical or seasonal component to the Paperbacks, though the plot may suggest that there is an upward trend. The fact that most of the ACF points are within the dotted blue lines suggested that there is unlikely to be correlations within the autocorrelations.

Let’s look at the hardcovers.

books[,"Hardcover"]
## Time Series:
## Start = 1 
## End = 30 
## Frequency = 1 
##  [1] 139 128 172 139 191 168 170 145 184 135 218 198 230 222 206 240 189
## [18] 222 158 178 217 261 238 240 214 200 201 283 220 259
autoplot(books[,"Hardcover"]) + ggtitle("Plot of Hardcover")

gglagplot(books[,"Hardcover"])

Acf(books[,"Hardcover"], lag.max=150)

Similar to the paperbacks, there is no seasonal or cyclical component to this, however, again, there may be an upward component to the hardcovers.

B. Use the ses() function to forecast each series, and plot the forecasts.

According to section 7.1, the ses() function is not likely the best method here for exponential smoothing as there appears to be a trend component. Nonetheless, for the sake of the question, we will apply the ses() function.

ses_paperback <- ses(books[,'Paperback'])
summary(ses_paperback)
## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = books[, "Paperback"]) 
## 
##   Smoothing parameters:
##     alpha = 0.1685 
## 
##   Initial states:
##     l = 170.8271 
## 
##   sigma:  34.8183
## 
##      AIC     AICc      BIC 
## 318.9747 319.8978 323.1783 
## 
## Error measures:
##                    ME     RMSE     MAE       MPE     MAPE      MASE
## Training set 7.175981 33.63769 27.8431 0.4736071 15.57784 0.7021303
##                    ACF1
## Training set -0.2117522
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       207.1097 162.4882 251.7311 138.8670 275.3523
## 32       207.1097 161.8589 252.3604 137.9046 276.3147
## 33       207.1097 161.2382 252.9811 136.9554 277.2639
## 34       207.1097 160.6259 253.5935 136.0188 278.2005
## 35       207.1097 160.0215 254.1979 135.0945 279.1249
## 36       207.1097 159.4247 254.7946 134.1818 280.0375
## 37       207.1097 158.8353 255.3840 133.2804 280.9389
## 38       207.1097 158.2531 255.9663 132.3899 281.8294
## 39       207.1097 157.6777 256.5417 131.5099 282.7094
## 40       207.1097 157.1089 257.1105 130.6400 283.5793
autoplot(ses_paperback) + ggtitle("SES for Paperbacks")

We will be performing the same for hardcover books.

ses_hardcover <- ses(books[,'Hardcover'])
summary(ses_hardcover)
## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = books[, "Hardcover"]) 
## 
##   Smoothing parameters:
##     alpha = 0.3283 
## 
##   Initial states:
##     l = 149.2861 
## 
##   sigma:  33.0517
## 
##      AIC     AICc      BIC 
## 315.8506 316.7737 320.0542 
## 
## Error measures:
##                    ME     RMSE      MAE      MPE     MAPE      MASE
## Training set 9.166735 31.93101 26.77319 2.636189 13.39487 0.7987887
##                    ACF1
## Training set -0.1417763
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       239.5601 197.2026 281.9176 174.7799 304.3403
## 32       239.5601 194.9788 284.1414 171.3788 307.7414
## 33       239.5601 192.8607 286.2595 168.1396 310.9806
## 34       239.5601 190.8347 288.2855 165.0410 314.0792
## 35       239.5601 188.8895 290.2306 162.0662 317.0540
## 36       239.5601 187.0164 292.1038 159.2014 319.9188
## 37       239.5601 185.2077 293.9124 156.4353 322.6848
## 38       239.5601 183.4574 295.6628 153.7584 325.3618
## 39       239.5601 181.7600 297.3602 151.1625 327.9577
## 40       239.5601 180.1111 299.0091 148.6406 330.4795
autoplot(ses_hardcover) + ggtitle("SES for Hardcovers")

C. Compute the RMSE values for the training data in each case.

As noted above, for paperbacks:

RMSE: 33.63769

For hardcovers:

RMSE: 31.93101

7.6

A. Now apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.

From section 7.2, “Holt’s linear trend method”, Holt extended simple exponential smoothing to allow the forecasting of data with a trend. This method involves a forecast equation and two smoothing equstions (one for the level and one for the trend.)

\[\text{Forecast equation: }\hat{y}_{t+h|t}=l_t + hb_t \] \[\text{Level equation: }l_t =\alpha y_t +(1-\alpha)(l_{t-1}+b_{t-1}) \] \[\text{Trend equation: }b_t=l_t + hb_t \] where \(l_t\) denotes an estimate of the level of the series at time t, \(b_t\) denotes an estimate of the trend (slope) of the series at time t, \(\alpha\) is the smoothing parameter for the level, \(0 \leq \alpha \leq 1\), and \(\beta^*\) is the smoothing parameter for the trend, \(0 \leq \beta^* \leq 1\).

This can be accomplished with the holt() function.

Paperback:

pc_holt <- holt(books[,'Paperback'], h=4)
summary(pc_holt)
## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = books[, "Paperback"], h = 4) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 170.699 
##     b = 1.2621 
## 
##   sigma:  33.4464
## 
##      AIC     AICc      BIC 
## 318.3396 320.8396 325.3456 
## 
## Error measures:
##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -3.717178 31.13692 26.18083 -5.508526 15.58354 0.6602122
##                    ACF1
## Training set -0.1750792
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       209.4668 166.6035 252.3301 143.9130 275.0205
## 32       210.7177 167.8544 253.5811 145.1640 276.2715
## 33       211.9687 169.1054 254.8320 146.4149 277.5225
## 34       213.2197 170.3564 256.0830 147.6659 278.7735
autoplot(pc_holt) + ggtitle("Holt for Paperback")

Hardcover:

hc_holt <- holt(books[,'Hardcover'], h=4)
summary(hc_holt)
## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = books[, "Hardcover"], h = 4) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 147.7935 
##     b = 3.303 
## 
##   sigma:  29.2106
## 
##      AIC     AICc      BIC 
## 310.2148 312.7148 317.2208 
## 
## Error measures:
##                      ME     RMSE      MAE       MPE    MAPE      MASE
## Training set -0.1357882 27.19358 23.15557 -2.114792 12.1626 0.6908555
##                     ACF1
## Training set -0.03245186
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       250.1739 212.7390 287.6087 192.9222 307.4256
## 32       253.4765 216.0416 290.9113 196.2248 310.7282
## 33       256.7791 219.3442 294.2140 199.5274 314.0308
## 34       260.0817 222.6468 297.5166 202.8300 317.3334
autoplot(hc_holt) + ggtitle("Holt for Hardcovers")

Just to note, both alpha and beta are nearly 0. The very small value of beta means that the slope hardly changed over time. A very small alpha weight gives earlier time series values more weight into consideration when predicting the levels.

B. Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.

Looking from above for both paperback and hardcover, the Holt method was better for both.

RMSE Paperback for SES: 33.63769 RMSE Paperback for Holt: 31.13692

RMSE Hardcover for SES: 31.93101 RMSE Hardcover for Holt: 27.19358

This is what we expected. When we had initially described the time series, there appeared to be an upward trend. ses() assumes that there is no seasonality or trend, which is why this was the incorrect method to us. As holt() incorporates trend, it was no surprise that the Holt linear trend method performed better than the simple exponential smoothing.

C. Compare the forecasts for the two series using both methods. Which do you think is best?

The ses method for both the paperback and hadcover more or less amounted forecasted the same value, whereas the holt method had increasing prediction points, which accounts for the upward trend. The better method appears to be the holt method.

However, there is a big caveat here. We are comparing these methods by the training set RMSE. To better evaluate the forecast accuracy, we should be either splitting our data into training and testing, or via a time series cross validation.

According to section 3.4, “A more sophisticated version of training/test sets is time series cross-validation. In this procedure, there are a series of test sets, each consisting of a single observation. The corresponding training set consists only of observations that occurred prior to the observation that forms the test set. Thus, no future observations can be used in constructing the forecast. Since it is not possible to obtain a reliable forecast based on a small training set, the earliest observations are not considered as test sets.”

Let us see if the time series cross-validation agrees that the hold method is best.

tsCV_ses_p <- tsCV(books[,'Paperback'], ses, drift=TRUE, h=1)
tsCV_holt_p <- tsCV(books[,'Paperback'], holt, drift=TRUE, h=1)
tsCV_ses_h <- tsCV(books[,'Hardcover'], ses, drift=TRUE, h=1)
tsCV_holt_h <- tsCV(books[,'Hardcover'], holt, drift=TRUE, h=1)
rmse_ses_p <- sqrt(mean(tsCV_ses_p^2, na.rm=TRUE))
rmse_holt_p <- sqrt(mean(tsCV_holt_p^2, na.rm=TRUE))
rmse_ses_h <- sqrt(mean(tsCV_ses_h^2, na.rm=TRUE))
rmse_holt_h <- sqrt(mean(tsCV_holt_h^2, na.rm=TRUE))

paste0("RMSE Test Set SES Paperback: ", rmse_ses_p)
## [1] "RMSE Test Set SES Paperback: 41.1091347030843"
paste0("RMSE Test Set Holt Paperback: ", rmse_holt_p)
## [1] "RMSE Test Set Holt Paperback: 40.9785871149664"
paste0("RMSE Test Set SES Hardcover: ", rmse_ses_h)
## [1] "RMSE Test Set SES Hardcover: 34.1833507755103"
paste0("RMSE Test Set Holt Paperback: ", rmse_holt_h)
## [1] "RMSE Test Set Holt Paperback: 34.5581188956473"

Interestingly, the holt method works better for paperback but not necessarily for hardcover. However, given that SES does not account for trend whereas holt does, we probably should consider staying with the holt method over the ses method in this particular situation.

D. Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.

Paperback:

# SES Paperback
summary(forecast(ses_paperback, h=1))
## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = books[, "Paperback"]) 
## 
##   Smoothing parameters:
##     alpha = 0.1685 
## 
##   Initial states:
##     l = 170.8271 
## 
##   sigma:  34.8183
## 
##      AIC     AICc      BIC 
## 318.9747 319.8978 323.1783 
## 
## Error measures:
##                    ME     RMSE     MAE       MPE     MAPE      MASE
## Training set 7.175981 33.63769 27.8431 0.4736071 15.57784 0.7021303
##                    ACF1
## Training set -0.2117522
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80   Lo 95    Hi 95
## 31       207.1097 162.4882 251.7311 138.867 275.3523
# Holt Paperback
summary(forecast(pc_holt, h=1))
## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = books[, "Paperback"], h = 4) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 170.699 
##     b = 1.2621 
## 
##   sigma:  33.4464
## 
##      AIC     AICc      BIC 
## 318.3396 320.8396 325.3456 
## 
## Error measures:
##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -3.717178 31.13692 26.18083 -5.508526 15.58354 0.6602122
##                    ACF1
## Training set -0.1750792
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80   Lo 95    Hi 95
## 31       209.4668 166.6035 252.3301 143.913 275.0205

The 95% confidence interval for SES Paperback: 138.867 275.3523

The 95% confidence interval for Holt Paperback: 143.913 275.0205

The 95% confidence interval is more narrow (and hence more confident) in the Holt method compared to the SES.

Hardcover:

# SES Paperback
summary(forecast(ses_hardcover, h=1))
## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = books[, "Hardcover"]) 
## 
##   Smoothing parameters:
##     alpha = 0.3283 
## 
##   Initial states:
##     l = 149.2861 
## 
##   sigma:  33.0517
## 
##      AIC     AICc      BIC 
## 315.8506 316.7737 320.0542 
## 
## Error measures:
##                    ME     RMSE      MAE      MPE     MAPE      MASE
## Training set 9.166735 31.93101 26.77319 2.636189 13.39487 0.7987887
##                    ACF1
## Training set -0.1417763
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       239.5601 197.2026 281.9176 174.7799 304.3403
# Holt Paperback
summary(forecast(hc_holt, h=1))
## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = books[, "Hardcover"], h = 4) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 147.7935 
##     b = 3.303 
## 
##   sigma:  29.2106
## 
##      AIC     AICc      BIC 
## 310.2148 312.7148 317.2208 
## 
## Error measures:
##                      ME     RMSE      MAE       MPE    MAPE      MASE
## Training set -0.1357882 27.19358 23.15557 -2.114792 12.1626 0.6908555
##                     ACF1
## Training set -0.03245186
## 
## Forecasts:
##    Point Forecast   Lo 80    Hi 80    Lo 95    Hi 95
## 31       250.1739 212.739 287.6087 192.9222 307.4256

The 95% confidence interval for SES Hardcover: 174.7799 304.3403

The 95% confidence interval for Holt Hardcover: 192.9222 307.4256

Similar to the paperback, the 95% confidence interval is more narrow for Holt.

7.7 For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900 - 1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.

[Hint: use h=100 when using holt() so you can clearly see the differences between the various options when plotting the forecasts.]

Loading the eggs dataset.

data(eggs)
str(eggs)
##  Time-Series [1:94] from 1900 to 1993: 277 315 315 321 315 ...
head(eggs)
## Time Series:
## Start = 1900 
## End = 1905 
## Frequency = 1 
## [1] 276.79 315.42 314.87 321.25 314.54 317.92
autoplot(eggs) + ggtitle("Price of Dozen Eggs in US 1900-1993")

gglagplot(eggs)

Acf(eggs)

This data does not have any obvious autocorrelations as evidenced on the lag plot. However, from the time series plot and the ACF plot, there is obvious uptrend in the data, which makes holt() an appropriate method here.

As noted in section 7.2, the holt() function can be modified such that we can perform transformations or use the damped holt method.

While holt certainly performs much better than ses in regards to time series with a trend, it isn’t without its faults. “The forecasts generated by Holt’s linear method display a constant trend (increasing or decreasing) indefinitely into the future. Empirical evidence indicates that these methods tend to over-forecast, especially for longer forecast horizons. Gardner & McKenzie introduced a parameter that”dampens" the trend to a flat line some time in the future. Methods that include a damped trend have proven to be very succesful. Using \(\phi\) as the damping paramater, this method can be described as:

\[\hat{y}_{t+h|t} = l_t + (\phi + \phi^2 + ... + \phi^h)b_t\] \[l_t = \alpha y_t + (1 - \alpha)(l_{t-1} + \phi b_{t-1})\] \[b_t = \beta^*(l_t-l_{t-1})+(1-\beta^*)\phi b_{t-1}\]

If \(\phi = 1\), the method is identical to Holt’s linear method. In practice, \(\phi\) is rarely less than 0.8 as the damping has a very strong effect for smaller values. Values of \(\phi\) close to 1 will mean that a damped model is not able to be distinguished from a non-damped model.

According to section section 3.2, a Box-Cox transformation is a family of transformations that includes both logarithms and power trnasformations. A good value of \(\lambda\) is one which makes the size of the seasonal variation about the same across the whole series, as that makes the forecasting model simpler.

Let us see which model would best fit this time series.

Which model gives the best RMSE?

Holt method with box-cox transformation:

lambda <- BoxCox.lambda(eggs)

holt_bc <- holt(BoxCox(eggs, lambda), h=100)
summary(holt_bc)
## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = BoxCox(eggs, lambda), h = 100) 
## 
##   Smoothing parameters:
##     alpha = 0.809 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 21.0322 
##     b = -0.1144 
## 
##   sigma:  1.0549
## 
##      AIC     AICc      BIC 
## 443.0310 443.7128 455.7475 
## 
## Error measures:
##                       ME     RMSE       MAE        MPE     MAPE      MASE
## Training set 0.002274669 1.032217 0.7627104 -0.3021486 4.367562 0.9355491
##                    ACF1
## Training set 0.03017351
## 
## Forecasts:
##      Point Forecast        Lo 80    Hi 80         Lo 95    Hi 95
## 1994    10.36032235   9.00840604 11.71224   8.292744767 12.42790
## 1995    10.24591443   8.50688632 11.98494   7.586300443 12.90553
## 1996    10.13150650   8.07698573 12.18603   6.989388130 13.27362
## 1997    10.01709858   7.68939837 12.34480   6.457188310 13.57701
## 1998     9.90269065   7.33060652 12.47477   5.969027426 13.83635
## 1999     9.78828272   6.99304728 12.58352   5.513339016 14.06323
## 2000     9.67387480   6.67198180 12.67577   5.082875643 14.26487
## 2001     9.55946687   6.36420822 12.75473   4.672740458 14.44619
## 2002     9.44505895   6.06744289 12.82268   4.279440957 14.61068
## 2003     9.33065102   5.77998931 12.88131   3.900382536 14.76092
## 2004     9.21624310   5.50054625 12.93194   3.533575152 14.89891
## 2005     9.10183517   5.22808974 12.97558   3.177452781 15.02622
## 2006     8.98742724   4.96179681 13.01306   2.830756786 15.14410
## 2007     8.87301932   4.70099423 13.04504   2.492457560 15.25358
## 2008     8.75861139   4.44512296 13.07210   2.161700112 15.35552
## 2009     8.64420347   4.19371272 13.09469   1.837765229 15.45064
## 2010     8.52979554   3.94636346 13.11323   1.520041083 15.53955
## 2011     8.41538761   3.70273150 13.12804   1.208002045 15.62277
## 2012     8.30097969   3.46251897 13.13944   0.901192586 15.70077
## 2013     8.18657176   3.22546574 13.14768   0.599214853 15.77393
## 2014     8.07216384   2.99134301 13.15298   0.301718936 15.84261
## 2015     7.95775591   2.75994829 13.15556   0.008395149 15.90712
## 2016     7.84334798   2.53110135 13.15559  -0.281032153 15.96773
## 2017     7.72894006   2.30464092 13.15324  -0.566809598 16.02469
## 2018     7.61453213   2.08042203 13.14864  -0.849158896 16.07822
## 2019     7.50012421   1.85831380 13.14193  -1.128280229 16.12853
## 2020     7.38571628   1.63819756 13.13324  -1.404355065 16.17579
## 2021     7.27130835   1.41996535 13.12265  -1.677548523 16.22017
## 2022     7.15690043   1.20351861 13.11028  -1.948011357 16.26181
## 2023     7.04249250   0.98876703 13.09622  -2.215881652 16.30087
## 2024     6.92808458   0.77562768 13.08054  -2.481286262 16.33746
## 2025     6.81367665   0.56402414 13.06333  -2.744342051 16.37170
## 2026     6.69926873   0.35388584 13.04465  -3.005156958 16.40369
## 2027     6.58486080   0.14514742 13.02457  -3.263830924 16.43355
## 2028     6.47045287  -0.06225176 13.00316  -3.520456698 16.46136
## 2029     6.35604495  -0.26836810 12.98046  -3.775120540 16.48721
## 2030     6.24163702  -0.47325416 12.95653  -4.027902838 16.51118
## 2031     6.12722910  -0.67695904 12.93142  -4.278878658 16.53334
## 2032     6.01282117  -0.87952863 12.90517  -4.528118217 16.55376
## 2033     5.89841324  -1.08100596 12.87783  -4.775687314 16.57251
## 2034     5.78400532  -1.28143143 12.84944  -5.021647711 16.58966
## 2035     5.66959739  -1.48084298 12.82004  -5.266057468 16.60525
## 2036     5.55518947  -1.67927637 12.78966  -5.508971248 16.61935
## 2037     5.44078154  -1.87676529 12.75833  -5.750440590 16.63200
## 2038     5.32637361  -2.07334156 12.72609  -5.990514149 16.64326
## 2039     5.21196569  -2.26903524 12.69297  -6.229237921 16.65317
## 2040     5.09755776  -2.46387482 12.65899  -6.466655442 16.66177
## 2041     4.98314984  -2.65788726 12.62419  -6.702807965 16.66911
## 2042     4.86874191  -2.85109815 12.58858  -6.937734626 16.67522
## 2043     4.75433399  -3.04353180 12.55220  -7.171472593 16.68014
## 2044     4.63992606  -3.23521130 12.51506  -7.404057203 16.68391
## 2045     4.52551813  -3.42615866 12.47719  -7.635522082 16.68656
## 2046     4.41111021  -3.61639480 12.43862  -7.865899263 16.68812
## 2047     4.29670228  -3.80593971 12.39934  -8.095219287 16.68862
## 2048     4.18229436  -3.99481244 12.35940  -8.323511299 16.68810
## 2049     4.06788643  -4.18303119 12.31880  -8.550803136 16.68658
## 2050     3.95347850  -4.37061335 12.27757  -8.777121408 16.68408
## 2051     3.83907058  -4.55757559 12.23572  -9.002491570 16.68063
## 2052     3.72466265  -4.74393382 12.19326  -9.226937995 16.67626
## 2053     3.61025473  -4.92970332 12.15021  -9.450484034 16.67099
## 2054     3.49584680  -5.11489873 12.10659  -9.673152076 16.66485
## 2055     3.38143887  -5.29953410 12.06241  -9.894963602 16.65784
## 2056     3.26703095  -5.48362290 12.01768 -10.115939238 16.65000
## 2057     3.15262302  -5.66717811 11.97242 -10.336098798 16.64134
## 2058     3.03821510  -5.85021216 11.92664 -10.555461332 16.63189
## 2059     2.92380717  -6.03273705 11.88035 -10.774045164 16.62166
## 2060     2.80939925  -6.21476431 11.83356 -10.991867929 16.61067
## 2061     2.69499132  -6.39630503 11.78629 -11.208946614 16.59893
## 2062     2.58058339  -6.57736993 11.73854 -11.425297585 16.58646
## 2063     2.46617547  -6.75796933 11.69032 -11.640936621 16.57329
## 2064     2.35176754  -6.93811316 11.64165 -11.855878943 16.55941
## 2065     2.23735962  -7.11781104 11.59253 -12.070139240 16.54486
## 2066     2.12295169  -7.29707225 11.54298 -12.283731698 16.52964
## 2067     2.00854376  -7.47590574 11.49299 -12.496670018 16.51376
## 2068     1.89413584  -7.65432017 11.44259 -12.708967445 16.49724
## 2069     1.77972791  -7.83232391 11.39178 -12.920636784 16.48009
## 2070     1.66531999  -8.00992507 11.34057 -13.131690424 16.46233
## 2071     1.55091206  -8.18713149 11.28896 -13.342140354 16.44396
## 2072     1.43650413  -8.36395075 11.23696 -13.551998182 16.42501
## 2073     1.32209621  -8.54039021 11.18458 -13.761275152 16.40547
## 2074     1.20768828  -8.71645699 11.13183 -13.969982158 16.38536
## 2075     1.09328036  -8.89215800 11.07872 -14.178129762 16.36469
## 2076     0.97887243  -9.06749993 11.02524 -14.385728205 16.34347
## 2077     0.86446450  -9.24248928 10.97142 -14.592787423 16.32172
## 2078     0.75005658  -9.41713235 10.91725 -14.799317057 16.29943
## 2079     0.63564865  -9.59143527 10.86273 -15.005326467 16.27662
## 2080     0.52124073  -9.76540397 10.80789 -15.210824744 16.25331
## 2081     0.40683280  -9.93904424 10.75271 -15.415820717 16.22949
## 2082     0.29242488 -10.11236168 10.69721 -15.620322969 16.20517
## 2083     0.17801695 -10.28536174 10.64140 -15.824339840 16.18037
## 2084     0.06360902 -10.45804974 10.58527 -16.027879442 16.15510
## 2085    -0.05079890 -10.63043083 10.52883 -16.230949665 16.12935
## 2086    -0.16520683 -10.80251002 10.47210 -16.433558187 16.10314
## 2087    -0.27961475 -10.97429221 10.41506 -16.635712478 16.07648
## 2088    -0.39402268 -11.14578215 10.35774 -16.837419814 16.04937
## 2089    -0.50843061 -11.31698448 10.30012 -17.038687278 16.02183
## 2090    -0.62283853 -11.48790370 10.24223 -17.239521771 15.99384
## 2091    -0.73724646 -11.65854421 10.18405 -17.439930016 15.96544
## 2092    -0.85165438 -11.82891030 10.12560 -17.639918566 15.93661
## 2093    -0.96606231 -11.99900614 10.06688 -17.839493811 15.90737
autoplot(holt_bc)

Damped holt method:

damped <- holt(eggs, damped=TRUE, h=100)
summary(damped)
## 
## Forecast method: Damped Holt's method
## 
## Model Information:
## Damped Holt's method 
## 
## Call:
##  holt(y = eggs, h = 100, damped = TRUE) 
## 
##   Smoothing parameters:
##     alpha = 0.8462 
##     beta  = 0.004 
##     phi   = 0.8 
## 
##   Initial states:
##     l = 276.9842 
##     b = 4.9966 
## 
##   sigma:  27.2755
## 
##      AIC     AICc      BIC 
## 1055.458 1056.423 1070.718 
## 
## Error measures:
##                     ME     RMSE     MAE       MPE     MAPE      MASE
## Training set -2.891496 26.54019 19.2795 -2.907633 10.01894 0.9510287
##                      ACF1
## Training set -0.003195358
## 
## Forecasts:
##      Point Forecast        Lo 80     Hi 80       Lo 95    Hi 95
## 1994       62.84884   27.8938665  97.80381    9.389822 116.3079
## 1995       62.79968   16.9363788 108.66299   -7.342188 132.9416
## 1996       62.76036    8.0760016 117.44472  -20.872149 146.3929
## 1997       62.72890    0.4263979 125.03140  -32.554554 158.0124
## 1998       62.70373   -6.4067675 131.81423  -42.991656 168.3991
## 1999       62.68360  -12.6402008 138.00740  -52.514212 177.8814
## 2000       62.66749  -18.4083657 143.74335  -61.327332 186.6623
## 2001       62.65461  -23.8016345 149.11085  -69.568803 194.8780
## 2002       62.64430  -28.8842950 154.17289  -77.336605 202.6252
## 2003       62.63605  -33.7040638 158.97616  -84.703440 209.9755
## 2004       62.62945  -38.2975438 163.55645  -91.725068 216.9840
## 2005       62.62417  -42.6935603 167.94191  -98.445402 223.6938
## 2006       62.61995  -46.9153057 172.15521 -104.899769 230.1397
## 2007       62.61657  -50.9817746 176.21492 -111.117108 236.3503
## 2008       62.61387  -54.9087594 180.13650 -117.121483 242.3492
## 2009       62.61171  -58.7095595 183.93298 -122.933160 248.1566
## 2010       62.60998  -62.3954996 187.61546 -128.569404 253.7894
## 2011       62.60860  -65.9763173 191.19351 -134.045059 259.2623
## 2012       62.60749  -69.4604571 194.67544 -139.373005 264.5880
## 2013       62.60660  -72.8552982 198.06851 -144.564498 269.7777
## 2014       62.60590  -76.1673328 201.37913 -149.629443 274.8412
## 2015       62.60533  -79.4023080 204.61297 -154.576610 279.7873
## 2016       62.60488  -82.5653397 207.77509 -159.413810 284.6236
## 2017       62.60451  -85.6610050 210.87003 -164.148030 289.3571
## 2018       62.60422  -88.6934183 213.90187 -168.785552 293.9940
## 2019       62.60399  -91.6662933 216.87428 -173.332049 298.5400
## 2020       62.60381  -94.5829961 219.79061 -177.792663 303.0003
## 2021       62.60366  -97.4465882 222.65390 -182.172070 307.3794
## 2022       62.60354 -100.2598639 225.46694 -186.474541 311.6816
## 2023       62.60344 -103.0253816 228.23227 -190.703985 315.9109
## 2024       62.60337 -105.7454907 230.95223 -194.863993 320.0707
## 2025       62.60331 -108.4223544 233.62897 -198.957870 324.1645
## 2026       62.60326 -111.0579701 236.26449 -202.988671 328.1952
## 2027       62.60322 -113.6541863 238.86062 -206.959220 332.1657
## 2028       62.60319 -116.2127174 241.41909 -210.872140 336.0785
## 2029       62.60316 -118.7351575 243.94148 -214.729866 339.9362
## 2030       62.60314 -121.2229914 246.42928 -218.534669 343.7410
## 2031       62.60313 -123.6776050 248.88386 -222.288668 347.4949
## 2032       62.60311 -126.1002941 251.30652 -225.993844 351.2001
## 2033       62.60310 -128.4922724 253.69848 -229.652054 354.8583
## 2034       62.60310 -130.8546789 256.06087 -233.265039 358.4712
## 2035       62.60309 -133.1885838 258.39476 -236.834435 362.0406
## 2036       62.60308 -135.4949941 260.70116 -240.361782 365.5679
## 2037       62.60308 -137.7748593 262.98102 -243.848533 369.0547
## 2038       62.60308 -140.0290751 265.23523 -247.296057 372.5022
## 2039       62.60307 -142.2584882 267.46464 -250.705648 375.9118
## 2040       62.60307 -144.4638997 269.67004 -254.078533 379.2847
## 2041       62.60307 -146.6460685 271.85221 -257.415871 382.6220
## 2042       62.60307 -148.8057141 274.01185 -260.718763 385.9249
## 2043       62.60307 -150.9435200 276.14965 -263.988255 389.1944
## 2044       62.60307 -153.0601355 278.26627 -267.225338 392.4315
## 2045       62.60307 -155.1561786 280.36231 -270.430959 395.6371
## 2046       62.60307 -157.2322378 282.43837 -273.606018 398.8121
## 2047       62.60306 -159.2888738 284.49500 -276.751371 401.9575
## 2048       62.60306 -161.3266220 286.53275 -279.867837 405.0740
## 2049       62.60306 -163.3459932 288.55212 -282.956199 408.1623
## 2050       62.60306 -165.3474759 290.55360 -286.017203 411.2233
## 2051       62.60306 -167.3315372 292.53766 -289.051562 414.2577
## 2052       62.60306 -169.2986243 294.50475 -292.059962 417.2661
## 2053       62.60306 -171.2491656 296.45529 -295.043058 420.2492
## 2054       62.60306 -173.1835715 298.38970 -298.001476 423.2076
## 2055       62.60306 -175.1022361 300.30836 -300.935821 426.1419
## 2056       62.60306 -177.0055375 302.21166 -303.846669 429.0528
## 2057       62.60306 -178.8938390 304.09997 -306.734577 431.9407
## 2058       62.60306 -180.7674896 305.97362 -309.600078 434.8062
## 2059       62.60306 -182.6268253 307.83295 -312.443686 437.6498
## 2060       62.60306 -184.4721691 309.67830 -315.265896 440.4720
## 2061       62.60306 -186.3038323 311.50996 -318.067183 443.2733
## 2062       62.60306 -188.1221147 313.32824 -320.848006 446.0541
## 2063       62.60306 -189.9273054 315.13343 -323.608807 448.8149
## 2064       62.60306 -191.7196831 316.92581 -326.350012 451.5561
## 2065       62.60306 -193.4995169 318.70564 -329.072033 454.2782
## 2066       62.60306 -195.2670664 320.47319 -331.775267 456.9814
## 2067       62.60306 -197.0225826 322.22871 -334.460097 459.6662
## 2068       62.60306 -198.7663080 323.97243 -337.126895 462.3330
## 2069       62.60306 -200.4984769 325.70460 -339.776019 464.9821
## 2070       62.60306 -202.2193162 327.42544 -342.407816 467.6139
## 2071       62.60306 -203.9290453 329.13517 -345.022621 470.2287
## 2072       62.60306 -205.6278766 330.83400 -347.620759 472.8269
## 2073       62.60306 -207.3160160 332.52214 -350.202545 475.4087
## 2074       62.60306 -208.9936627 334.19979 -352.768284 477.9744
## 2075       62.60306 -210.6610101 335.86714 -355.318272 480.5244
## 2076       62.60306 -212.3182455 337.52437 -357.852795 483.0589
## 2077       62.60306 -213.9655507 339.17168 -360.372131 485.5783
## 2078       62.60306 -215.6031021 340.80923 -362.876550 488.0827
## 2079       62.60306 -217.2310709 342.43720 -365.366313 490.5724
## 2080       62.60306 -218.8496235 344.05575 -367.841676 493.0478
## 2081       62.60306 -220.4589213 345.66505 -370.302884 495.5090
## 2082       62.60306 -222.0591213 347.26525 -372.750179 497.9563
## 2083       62.60306 -223.6503761 348.85650 -375.183793 500.3899
## 2084       62.60306 -225.2328340 350.43896 -377.603954 502.8101
## 2085       62.60306 -226.8066394 352.01277 -380.010881 505.2170
## 2086       62.60306 -228.3719326 353.57806 -382.404791 507.6109
## 2087       62.60306 -229.9288503 355.13498 -384.785891 509.9920
## 2088       62.60306 -231.4775255 356.68365 -387.154385 512.3605
## 2089       62.60306 -233.0180877 358.22421 -389.510472 514.7166
## 2090       62.60306 -234.5506631 359.75679 -391.854344 517.0605
## 2091       62.60306 -236.0753747 361.28150 -394.186189 519.3923
## 2092       62.60306 -237.5923423 362.79847 -396.506191 521.7123
## 2093       62.60306 -239.1016828 364.30781 -398.814528 524.0207
autoplot(damped)

Damped holt method with Box-Cox Transformation:

damped_bc <- holt(BoxCox(eggs, lambda), damped=TRUE, h=100)
summary(damped_bc)
## 
## Forecast method: Damped Holt's method
## 
## Model Information:
## Damped Holt's method 
## 
## Call:
##  holt(y = BoxCox(eggs, lambda), h = 100, damped = TRUE) 
## 
##   Smoothing parameters:
##     alpha = 0.8356 
##     beta  = 1e-04 
##     phi   = 0.98 
## 
##   Initial states:
##     l = 21.6922 
##     b = -0.1429 
## 
##   sigma:  1.068
## 
##      AIC     AICc      BIC 
## 446.2962 447.2617 461.5560 
## 
## Error measures:
##                    ME     RMSE       MAE        MPE   MAPE      MASE
## Training set -0.06684 1.039187 0.7829654 -0.7310197 4.5076 0.9603941
##                     ACF1
## Training set 0.005210529
## 
## Forecasts:
##      Point Forecast       Lo 80    Hi 80        Lo 95    Hi 95
## 1994      10.462309  9.09363872 11.83098  8.369108478 12.55551
## 1995      10.441227  8.65758545 12.22487  7.713382769 13.16907
## 1996      10.420566  8.30165254 12.53948  7.179967379 13.66117
## 1997      10.400319  7.99230959 12.80833  6.717586449 14.08305
## 1998      10.380477  7.71448490 13.04647  6.303194165 14.45776
## 1999      10.361031  7.45985933 13.26220  5.924071811 14.79799
## 2000      10.341974  7.22326308 13.46069  5.572317027 15.11163
## 2001      10.323299  7.00122194 13.64538  5.242620607 15.40398
## 2002      10.304997  6.79126714 13.81873  4.931210922 15.67878
## 2003      10.287061  6.59156914 13.98255  4.635293903 15.93883
## 2004      10.269484  6.40072699 14.13824  4.352730836 16.18624
## 2005      10.252258  6.21763928 14.28688  4.081841062 16.42268
## 2006      10.235377  6.04142117 14.42933  3.821275066 16.64948
## 2007      10.218833  5.87134884 14.56632  3.569929499 16.86774
## 2008      10.202621  5.70682100 14.69842  3.326888336 17.07835
## 2009      10.186732  5.54733148 14.82613  3.091380987 17.28208
## 2010      10.171162  5.39244930 14.94987  2.862751729 17.47957
## 2011      10.155903  5.24180373 15.07000  2.640436948 17.67137
## 2012      10.140949  5.09507300 15.18682  2.423947865 17.85795
## 2013      10.126294  4.95197566 15.30061  2.212857227 18.03973
## 2014      10.111932  4.81226367 15.41160  2.006788895 18.21707
## 2015      10.097857  4.67571710 15.52000  1.805409596 18.39030
## 2016      10.084064  4.54213976 15.62599  1.608422316 18.55971
## 2017      10.070547  4.41135569 15.72974  1.415560950 18.72553
## 2018      10.057300  4.28320633 15.83139  1.226585923 18.88801
## 2019      10.044318  4.15754817 15.93109  1.041280587 19.04735
## 2020      10.031595  4.03425072 16.02894  0.859448221 19.20374
## 2021      10.019128  3.91319496 16.12506  0.680909518 19.35735
## 2022      10.006909  3.79427188 16.21955  0.505500472 19.50832
## 2023       9.994935  3.67738136 16.31249  0.333070577 19.65680
## 2024       9.983200  3.56243110 16.40397  0.163481300 19.80292
## 2025       9.971700  3.44933585 16.49406 -0.003395237 19.94680
## 2026       9.960430  3.33801658 16.58284 -0.167677390 20.08854
## 2027       9.949386  3.22839990 16.67037 -0.329474976 20.22825
## 2028       9.938562  3.12041748 16.75671 -0.488890128 20.36601
## 2029       9.927955  3.01400552 16.84190 -0.646018040 20.50193
## 2030       9.917560  2.90910441 16.92601 -0.800947628 20.63607
## 2031       9.907372  2.80565825 17.00909 -0.953762101 20.76851
## 2032       9.897389  2.70361461 17.09116 -1.104539478 20.89932
## 2033       9.887605  2.60292415 17.17229 -1.253353035 21.02856
## 2034       9.878017  2.50354041 17.25249 -1.400271712 21.15631
## 2035       9.868621  2.40541957 17.33182 -1.545360468 21.28260
## 2036       9.859412  2.30852022 17.41030 -1.688680610 21.40751
## 2037       9.850388  2.21280316 17.48797 -1.830290072 21.53107
## 2038       9.841544  2.11823128 17.56486 -1.970243683 21.65333
## 2039       9.832877  2.02476936 17.64099 -2.108593395 21.77435
## 2040       9.824384  1.93238393 17.71638 -2.245388500 21.89416
## 2041       9.816060  1.84104320 17.79108 -2.380675817 22.01280
## 2042       9.807903  1.75071686 17.86509 -2.514499868 22.13031
## 2043       9.799909  1.66137606 17.93844 -2.646903039 22.24672
## 2044       9.792075  1.57299325 18.01116 -2.777925719 22.36208
## 2045       9.784397  1.48554214 18.08325 -2.907606439 22.47640
## 2046       9.776873  1.39899759 18.15475 -3.035981985 22.58973
## 2047       9.769500  1.31333553 18.22566 -3.163087515 22.70209
## 2048       9.762274  1.22853296 18.29601 -3.288956659 22.81350
## 2049       9.755193  1.14456778 18.36582 -3.413621610 22.92401
## 2050       9.748253  1.06141884 18.43509 -3.537113212 23.03362
## 2051       9.741452  0.97906581 18.50384 -3.659461042 23.14236
## 2052       9.734787  0.89748919 18.57208 -3.780693478 23.25027
## 2053       9.728255  0.81667020 18.63984 -3.900837773 23.35735
## 2054       9.721854  0.73659080 18.70712 -4.019920113 23.46363
## 2055       9.715581  0.65723362 18.77393 -4.137965679 23.56913
## 2056       9.709433  0.57858194 18.84028 -4.254998700 23.67386
## 2057       9.703408  0.50061964 18.90620 -4.371042501 23.77786
## 2058       9.697504  0.42333116 18.97168 -4.486119555 23.88113
## 2059       9.691718  0.34670151 19.03673 -4.600251525 23.98369
## 2060       9.686048  0.27071622 19.10138 -4.713459302 24.08556
## 2061       9.680491  0.19536129 19.16562 -4.825763048 24.18674
## 2062       9.675045  0.12062322 19.22947 -4.937182228 24.28727
## 2063       9.669708  0.04648894 19.29293 -5.047735646 24.38715
## 2064       9.664478 -0.02705419 19.35601 -5.157441475 24.48640
## 2065       9.659353 -0.10001839 19.41872 -5.266317287 24.58502
## 2066       9.654330 -0.17241550 19.48107 -5.374380083 24.68304
## 2067       9.649407 -0.24425700 19.54307 -5.481646313 24.78046
## 2068       9.644583 -0.31555399 19.60472 -5.588131908 24.87730
## 2069       9.639855 -0.38631724 19.66603 -5.693852297 24.97356
## 2070       9.635222 -0.45655719 19.72700 -5.798822434 25.06927
## 2071       9.630682 -0.52628399 19.78765 -5.903056813 25.16442
## 2072       9.626232 -0.59550747 19.84797 -6.006569491 25.25903
## 2073       9.621871 -0.66423716 19.90798 -6.109374108 25.35312
## 2074       9.617598 -0.73248234 19.96768 -6.211483899 25.44668
## 2075       9.613410 -0.80025202 20.02707 -6.312911714 25.53973
## 2076       9.609306 -0.86755495 20.08617 -6.413670034 25.63228
## 2077       9.605284 -0.93439963 20.14497 -6.513770984 25.72434
## 2078       9.601342 -1.00079435 20.20348 -6.613226347 25.81591
## 2079       9.597479 -1.06674714 20.26171 -6.712047578 25.90701
## 2080       9.593694 -1.13226584 20.31965 -6.810245817 25.99763
## 2081       9.589984 -1.19735806 20.37733 -6.907831901 26.08780
## 2082       9.586348 -1.26203123 20.43473 -7.004816372 26.17751
## 2083       9.582785 -1.32629257 20.49186 -7.101209494 26.26678
## 2084       9.579293 -1.39014911 20.54874 -7.197021257 26.35561
## 2085       9.575872 -1.45360771 20.60535 -7.292261391 26.44400
## 2086       9.572518 -1.51667505 20.66171 -7.386939374 26.53198
## 2087       9.569232 -1.57935765 20.71782 -7.481064441 26.61953
## 2088       9.566011 -1.64166184 20.77368 -7.574645592 26.70667
## 2089       9.562855 -1.70359384 20.82930 -7.667691601 26.79340
## 2090       9.559762 -1.76515966 20.88468 -7.760211023 26.87973
## 2091       9.556731 -1.82636521 20.93983 -7.852212202 26.96567
## 2092       9.553760 -1.88721624 20.99474 -7.943703278 27.05122
## 2093       9.550849 -1.94771836 21.04942 -8.034692192 27.13639
autoplot(damped_bc)

It is incredible how much a transformation can improve a model. The Box-Cox transformation, utilizing an optimal lambda value of 0.3956 as dropped the RMSE down all the way to 1.032, which is a significant improvement from the damped only holt method. The damped method with the Box Cox transformation did not improve the model, so the best model here is the holt method with the Box Cox Transformation. This is likely because the variation within the time series had varying variance.

7.8 Recall your retail time series data (from Exercise 3 in Section 2.10).

retaildata <- readxl::read_excel("retail.xlsx", skip=1)

# Category: Turnover ;  Western Australia ;  Furniture, floor coverings, houseware and textile goods retailing - this was from my previous three homework assignments.
myts <- ts(retaildata[,"A3349661X"],
           frequency=12, start=c(1982,4))

A. Why is multiplicative seasonality necessary for this series?

Let’s plot out the series.

autoplot(myts) + ggtitle("Retail Data Time Series")

As you can see, the variations are becoming increasingly larger, therefore multiplicative decomposition is more appropriate than the additive decomposition. (Whereas, additive decomposition performs better with a constant time series variance.)

Reference: http://kourentzes.com/forecasting/2014/11/09/additive-and-multiplicative-seasonality/

Let us initially decompose this data using the decompose() function.

autoplot(decompose(myts,type="multiplicative"))

There clearly is a seasonal component with an upward trend.

B. Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

According to section 7.3, “Holt-Winters’ Seasonal Method”, this is an extension of Holt’s method to capture seasonality. “The Holt-Winters seasonal method comprises the forecast equation and three smoothing equations - one for the level \(l_t\), one for the trend, \(b_t\), and one for the seasonal component \(s_t\), with corresponding smoothing parameters, \(\alpha, \beta^*, \text{and } \gamma"\). We use m to denote the frequency of the seasonality, i.e., the number of seasons in a year. For example, for quarterly data, \(m=4\), and for monthly data \(m=12\).”

“There are two variations to this method that differ in the nature of the seasonal component. The additive method is preferred when the seasonal variations are roughly constant through the series, while the multiplicative method is preferred when the seasonal variations are changing proportional to the level of the series. With the additive method, the seasonal component is expressed in absolute terms in the scale of the observed series, and in the level equation the series is seasonally adjusted by subtracting the seasonal component. Within each year, the seasonal component will add up to approximately zero. With the multiplicative method, the seasonal component is expressed in relative terms (percentages), and the series is seasonally adjusted by dividing through by the seasonal component. Within each year, the seasonal component will sum up to approximately m.”

“The component form for the additive method is:”

\[\hat{y}_{y+h|t}=l_t+hb_t+s_{t+h-m(k+1)}\] \[l_t=\alpha (y_t-s_{t-m})+(1-\alpha)(l_{t-1}+b_{t-1})\] \[b_t=\beta^*(l_t-l_{t-1})+(1-\beta^*)b_{t-1}\] \[s_t=\gamma(y_t-l_{t-1}-b_{t-1})+(1-\gamma)s_{t-m}\] “where k is the integer part of \((h-1)/m\), which ensures that the estimates of the seasonal indices used for forecasting come from the final year of the sample. The level equation shows a weighted average between the seasonally adjusted observation \(y_t-s_{t-m}\) and the non-seasonal forecast \((l_{t-1}+b_{t-1})\) for time t. The trend equation is identical to Holt’s linear method. The seasonal equation shows a weighted average between the current seasonal index, \(y_t-l_{t-1}-b_{t-1})\), and the seaonl index of the same season last year (i.e., m time periods ago).”

The component form for the multiplicative method is:

\[\hat{y}_{y+h|t}=(l_t+hb_t)s_{t+h-m(k+1)}\] \[l_t=\alpha \frac{y_t}{s_{t-m}} +(1-\alpha)(l_{t-1}+b_{t-1})\] \[b_t=\beta^*(l_t-l_{t-1})+(1-\beta^*)b_{t-1}\] \[s_t=\gamma \frac{y_t}{(l_{t-1}+b_{t-1})} +(1-\gamma)s_{t-m}\]

Given that we are dealing with a multiplicative series, we will be using the Holt-Winters’ multiplicative method as described.

fit_hw <- hw(myts, seasonal="multiplicative",damped=FALSE)
summary(fit_hw)
## 
## Forecast method: Holt-Winters' multiplicative method
## 
## Model Information:
## Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = myts, seasonal = "multiplicative", damped = FALSE) 
## 
##   Smoothing parameters:
##     alpha = 0.581 
##     beta  = 0.0146 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 19.4276 
##     b = -0.1358 
##     s = 0.9298 0.8839 0.9697 1.1372 1.0888 1.0706
##            1.0107 0.9993 1.0149 1.0267 0.978 0.8902
## 
##   sigma:  0.0755
## 
##      AIC     AICc      BIC 
## 3346.914 3348.600 3413.941 
## 
## Error measures:
##                     ME     RMSE      MAE         MPE     MAPE      MASE
## Training set 0.1358893 4.859899 3.397537 0.001891542 5.684886 0.4111136
##                     ACF1
## Training set -0.07382362
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## Jan 2014       151.7305 137.0487 166.4123 129.27665 174.1843
## Feb 2014       138.8964 123.2586 154.5341 114.98044 162.8123
## Mar 2014       146.7200 128.0858 165.3541 118.22151 175.2184
## Apr 2014       141.0709 121.2561 160.8857 110.76676 171.3750
## May 2014       155.6390 131.7931 179.4848 119.16991 192.1081
## Jun 2014       164.0470 136.9084 191.1856 122.54205 205.5519
## Jul 2014       162.8631 133.9982 191.7280 118.71801 207.0082
## Aug 2014       161.0243 130.6385 191.4102 114.55314 207.4955
## Sep 2014       163.5302 130.8401 196.2203 113.53504 213.5254
## Oct 2014       173.9624 137.2774 210.6475 117.85757 230.0673
## Nov 2014       177.6277 138.2528 217.0025 117.40907 237.8463
## Dec 2014       186.2875 143.0115 229.5636 120.10256 252.4725
## Jan 2015       159.5075 120.7763 198.2387 100.27320 218.7418
## Feb 2015       145.9853 109.0197 182.9509  89.45123 202.5193
## Mar 2015       154.1765 113.5481 194.8049  92.04067 216.3122
## Apr 2015       148.2100 107.6384 188.7817  86.16108 210.2590
## May 2015       163.4823 117.0693 209.8954  92.49967 234.4650
## Jun 2015       172.2795 121.6285 222.9304  94.81548 249.7435
## Jul 2015       171.0022 119.0074 222.9969  91.48300 250.5213
## Aug 2015       169.0381 115.9477 222.1286  87.84327 250.2330
## Sep 2015       171.6351 116.0158 227.2544  86.57275 256.6975
## Oct 2015       182.5489 121.5758 243.5220  89.29855 275.7993
## Nov 2015       186.3591 122.2623 250.4559  88.33154 284.3867
## Dec 2015       195.4073 126.2610 264.5537  89.65709 301.1575
autoplot(fit_hw) + ggtitle("Holt Winter Multiplicative Fit for MYTS")

Now let’s experiment with the damped Holt-Winter’s multiplicative method.

fit_hw2 <- hw(myts, seasonal="multiplicative",damped=TRUE)
summary(fit_hw2)
## 
## Forecast method: Damped Holt-Winters' multiplicative method
## 
## Model Information:
## Damped Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = myts, seasonal = "multiplicative", damped = TRUE) 
## 
##   Smoothing parameters:
##     alpha = 0.4886 
##     beta  = 0.017 
##     gamma = 1e-04 
##     phi   = 0.9618 
## 
##   Initial states:
##     l = 19.4102 
##     b = -0.0203 
##     s = 0.9302 0.8819 0.9689 1.1379 1.0896 1.0694
##            1.0119 1 1.0151 1.026 0.9767 0.8924
## 
##   sigma:  0.0756
## 
##      AIC     AICc      BIC 
## 3346.966 3348.855 3417.936 
## 
## Error measures:
##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.4006301 4.847269 3.401302 0.2947955 5.676417 0.4115693
##                     ACF1
## Training set 0.009577922
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## Jan 2014       151.1860 136.5317 165.8403 128.77417 173.5978
## Feb 2014       137.8184 122.8455 152.7912 114.91937 160.7174
## Mar 2014       145.5703 128.1229 163.0176 118.88689 172.2536
## Apr 2014       139.8699 121.5899 158.1498 111.91314 167.8266
## May 2014       153.2848 131.6355 174.9341 120.17503 186.3946
## Jun 2014       161.2102 136.7801 185.6403 123.84763 198.5728
## Jul 2014       159.7206 133.9018 185.5395 120.23408 199.2072
## Aug 2014       157.5212 130.4921 184.5504 116.18372 198.8588
## Sep 2014       159.5724 130.6283 188.5165 115.30628 203.8385
## Oct 2014       168.8528 136.5933 201.1123 119.51617 218.1894
## Nov 2014       172.2165 137.6697 206.7634 119.38171 225.0514
## Dec 2014       180.0537 142.2340 217.8733 122.21353 237.8938
## Jan 2015       153.4668 119.7965 187.1371 101.97249 204.9611
## Feb 2015       139.8151 107.8454 171.7848  90.92166 188.7086
## Mar 2015       147.5958 112.4928 182.6988  93.91039 201.2812
## Apr 2015       141.7392 106.7399 176.7385  88.21232 195.2661
## May 2015       155.2525 115.5160 194.9891  94.48067 216.0244
## Jun 2015       163.1981 119.9675 206.4287  97.08261 229.3136
## Jul 2015       161.6126 117.3670 205.8583  93.94478 229.2805
## Aug 2015       159.3138 114.2936 204.3341  90.46127 228.1664
## Sep 2015       161.3170 114.3194 208.3146  89.44036 233.1937
## Oct 2015       170.6264 119.4346 221.8183  92.33524 248.9176
## Nov 2015       173.9546 120.2638 227.6455  91.84162 256.0677
## Dec 2015       181.7997 124.1302 239.4693  93.60176 269.9977
autoplot(fit_hw2) + ggtitle("Holt Winter Damped Multiplicative Fit for MYTS")

When comparing RMSE, AIC, BIC, the two different methods are not very different. The damped method may provide a trivial improvement over the non-damped method.

C. Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

Similar to the previous exercises, we’ll use forecast() using \(h=1\) and determine the RMSE.

# Non-damped Holt Winter's Multiplicative Method
summary(forecast(fit_hw, h=1))
## 
## Forecast method: Holt-Winters' multiplicative method
## 
## Model Information:
## Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = myts, seasonal = "multiplicative", damped = FALSE) 
## 
##   Smoothing parameters:
##     alpha = 0.581 
##     beta  = 0.0146 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 19.4276 
##     b = -0.1358 
##     s = 0.9298 0.8839 0.9697 1.1372 1.0888 1.0706
##            1.0107 0.9993 1.0149 1.0267 0.978 0.8902
## 
##   sigma:  0.0755
## 
##      AIC     AICc      BIC 
## 3346.914 3348.600 3413.941 
## 
## Error measures:
##                     ME     RMSE      MAE         MPE     MAPE      MASE
## Training set 0.1358893 4.859899 3.397537 0.001891542 5.684886 0.4111136
##                     ACF1
## Training set -0.07382362
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 2014       151.7305 137.0487 166.4123 129.2766 174.1843
# Damped Holt Winter's Multiplicative Method
summary(forecast(fit_hw2, h=1))
## 
## Forecast method: Damped Holt-Winters' multiplicative method
## 
## Model Information:
## Damped Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = myts, seasonal = "multiplicative", damped = TRUE) 
## 
##   Smoothing parameters:
##     alpha = 0.4886 
##     beta  = 0.017 
##     gamma = 1e-04 
##     phi   = 0.9618 
## 
##   Initial states:
##     l = 19.4102 
##     b = -0.0203 
##     s = 0.9302 0.8819 0.9689 1.1379 1.0896 1.0694
##            1.0119 1 1.0151 1.026 0.9767 0.8924
## 
##   sigma:  0.0756
## 
##      AIC     AICc      BIC 
## 3346.966 3348.855 3417.936 
## 
## Error measures:
##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.4006301 4.847269 3.401302 0.2947955 5.676417 0.4115693
##                     ACF1
## Training set 0.009577922
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 2014        151.186 136.5317 165.8403 128.7742 173.5978

I suspect that the damped and non-damped methods are quite similar because damped methods generally help flatten out the prediction trend over a longer length of h. With h=1, the damped effect is neglible. This also likely explains why the prediction points and confidence intervals are nearly similar.

D. Check that the residuals from the best method look like white noise.

According to section 2.9, “White Noise”, “time series that show no autocorrelation are called white noise. For white noise series, we expect each autocorrelation to be close to zero. For a white noise series, we expect 95% of the spikes in the ACF to lie within \(\pm 2/\sqrt{T}\) where T is the length of the time series. It is common to plot these bounds on a graph of the ACF (the blue dashed lines”. If one or more large spikes are outside these bounds, or if substantially more than 5% of spikes are outside these bounds, then the series is probably not white noise."

We will use the damped method as the RMSE was slightly better.

ggAcf(fit_hw2, lag.max=50)

If we look at the ACF graph, the majority fall within the 95% CI (with the exception of the early lag points). But in some ACF subplots, it appears that perhaps there is more than 5% of the points exceeding the 95% CI. Is there a more exact way to demonstrate that the dataset is white noise?

Another way to prove this is via the Box-Ljung test as shown in section 3.3, “Residual Diagnostics”. (Remember, we must use the residuals of the fitted model in the Box.test function.)

Box.test(residuals(fit_hw2),lag=50,fitdf=0, type="Lj")
## 
##  Box-Ljung test
## 
## data:  residuals(fit_hw2)
## X-squared = 87.661, df = 50, p-value = 0.0007894

As the p less than 0.05, there is statistical significance and thus must reject the hypothesis that the residuals are not distinguishable from a white noise series.

E. Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naive approach from Exercise 8 in Section 3.7?

Recall from Exercise 8 in Section 3.7 was in regards the retail time series myts. That exercise had us create a seasonal naive model, and then check its accuracy and residuals. Let us reproduce that here.

# spliting into training and testing data
myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)

autoplot(myts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test")

fc_snaive <- snaive(myts.train)
accuracy(fc_snaive, myts.test)
##                     ME     RMSE       MAE       MPE     MAPE     MASE
## Training set  3.518619 10.42741  7.244144  5.375707 11.81551 1.000000
## Test set     30.512500 33.10478 30.512500 20.388974 20.38897 4.212023
##                   ACF1 Theil's U
## Training set 0.7935927        NA
## Test set     0.3613754   2.66608
checkresiduals(fc_snaive)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 1059, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

The test set RMSE is 33.10478.

Can we beat the seasonal naive model?

# Damped Holt Winter Multiplicative Model
fc_damped <- hw(myts.train, type="multiplicative",damped=TRUE)
accuracy(fc_damped, myts.test)
##                      ME      RMSE       MAE       MPE     MAPE      MASE
## Training set  0.5438005  4.870502  3.443183 0.6596744 6.315223 0.4753057
## Test set     11.8175662 16.740096 13.874180 7.2518408 8.973145 1.9152269
##                    ACF1 Theil's U
## Training set 0.04657103        NA
## Test set     0.48283129  1.273536
checkresiduals(fc_damped)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt-Winters' additive method
## Q* = 45.451, df = 7, p-value = 1.118e-07
## 
## Model df: 17.   Total lags used: 24

This model appears to be a better predictor as the RMSE in the damped model is 16.74. The ggplot visualization below demonstrates the testing dataset with the damped HW model.

autoplot(myts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test") +
  autolayer(fc_damped, series="Damped HW Model")

7.9 For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

Recall from section 6.6, “STL decomposition”: “STL is an acronym for Seasonal and Trend decomposition using Loess, while Loess is a method for estimating nonlinear relationship. STL has some disadvantages as…it does not handle trading day or calendar variation automatically, and it only provides facilities for additive decompositions.”

“It is possible to obtain a multiplicative decomposition by first taking logs of the data, then back-transforming the components. Decompositions between additive and multiplicative can be obtained using a Box-Cox transformation of the data with \(0 < \lambda < 1\). A value of \(\lambda = 0\) corresponds to the multiplicative decomposition while \(\lambda = 1\) is equivalent to an additive decomposition.”

Here, we will transform the data via the Box-Cox transformation (thus effectively changing this from a multiplicative time series into an additive time series…which an STL decomposition can be appropriately applied given that STL decomposition only works on additive time series), and then apply an STL decomposition. After we perform the STL decomposition, we will extract out the seasonal component leaving a time series with just the trend and remainder component.

# Initially, I had attempted to pass myts through stl(), but returned an error stating "only univariate series are allowed." Therefore, I had created a new ts using the same data in accordance with this stackoverflow reference:

# https://stackoverflow.com/questions/28245059/stl-decomposition-wont-accept-univariate-ts-object/31721358

tsData <- ts(
  data=retaildata$A3349661X,
  frequency=12, 
  start=c(1982,4)
)

# Find lambda
lambda <- BoxCox.lambda(tsData)

stl_retail <- tsData %>% 
  BoxCox(lambda = lambda) %>% 
  stl(t.window=13, s.window="periodic", robust=TRUE) %>% seasadj()
  
summary(stl_retail)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.619   3.142   3.653   3.608   4.033   4.529
autoplot(stl_retail) + ggtitle("STL Decomposition of myts (Seasonally Adjusted)")

Clearly, there is an upward trend, even when adjusted for the seaonality.

For the second portion of the question (regarding using the ets() function):

From sections 7.6 and 7.7, “Forecasting with ETS models”, we can use the ets() function to find an appropriate ETS model. Remember, ETS points forecasts to the medians of the forecast distributions. For a better understanding of the taxonomy of the ETS model, more information can be obtained from section 7.4.

We will utilize the Box-Cox transformed and seasonally adjusted time series and we will apply the ETS model to this.

ets_model <- stl_retail %>% ets(model="ZZZ")
summary(ets_model)
## ETS(A,A,N) 
## 
## Call:
##  ets(y = ., model = "ZZZ") 
## 
##   Smoothing parameters:
##     alpha = 0.5066 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 2.8107 
##     b = 0.0044 
## 
##   sigma:  0.0585
## 
##     AIC    AICc     BIC 
## 106.758 106.918 126.472 
## 
## Training set error measures:
##                         ME       RMSE        MAE         MPE    MAPE
## Training set -4.894186e-05 0.05816795 0.04539999 -0.02577639 1.30073
##                   MASE        ACF1
## Training set 0.4443646 -0.01057731
autoplot(ets_model) + ggtitle("ETS Model for myts")

The ETS model returned an AAN model, which is the Holt’s linear method with additive errors. This makes sense since the seasonally adjusted data had excluded out the seasonal data; so what is left is the trend and remainder component, thus making the Holt’s linear method with additive errors appropriate. The training set RMSE is 0.05816795 which is better than the damped Holt Method.