1 HW5: Exponential Smoothing

Do exercises 7.1, 7.5,7.6, 7.7, 7.8 and 7.9 in Hyndman. Please submit both the link to your Rpubs and the .rmd file.

1.1 Ex. 7.1

Consider the pigs series — the number of pigs slaughtered in Victoria each month.

(a.) Use the ses() function in \(R\) to find the optimal values of \(\alpha\) and \(l_{0}\), and generate forecasts for the next four months.

(b.) Compute a 95% prediction interval for the first forecast using \(\hat{y}\pm 1.96s\) where \(s\) is the standard deviation of the residuals. Compare your interval with the interval produced by \(R\).

1.1.1 Part a

Use the ses() function in \(R\) to find the optimal values of \(\alpha\) and \(l_{0}\), and generate forecasts for the next four months.

Answer:

  • From the ses function below, we got \(\alpha = 0.2971\) and \(l_{0} = 77260.0561\).

  • The forecasts for the next four months are showed below in table form and in graph form.

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = pigs, h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.2971 
## 
##   Initial states:
##     l = 77260.0561 
## 
##   sigma:  10308.58
## 
##      AIC     AICc      BIC 
## 4462.955 4463.086 4472.665 
## 
## Error measures:
##                    ME    RMSE      MAE       MPE     MAPE      MASE
## Training set 385.8721 10253.6 7961.383 -0.922652 9.274016 0.7966249
##                    ACF1
## Training set 0.01282239
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Sep 1995       98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995       98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995       98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995       98816.41 83958.37 113674.4 76092.99 121539.8

1.1.2 Part b

Compute a 95% prediction interval for the first forecast using \(\hat{y}\pm 1.96s\) where \(s\) is the standard deviation of the residuals. Compare your interval with the interval produced by \(R\).

Answer:

  • From the \(R\) summary table below, the 95% prediction interval for the first forecast is \([78611.97, 119020.8]\).

  • By calculation using \(\hat{y}\pm 1.96s\) , the 95% prediction interval for the first forecast is \([78679.97, 118952.84]\).

  • The interval computed by the formula has a smaller range than the one produced by \(R\).

##       95%       95% 
##  78611.97 119020.84
## [1]  78679.97 118952.84

1.2 Ex. 7.5

Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.

(a.) Plot the series and discuss the main features of the data.

(b.) Use the ses() function to forecast each series, and plot the forecasts.

(c.) Compute the RMSE values for the training data in each case.

1.2.1 Part a

Plot the series and discuss the main features of the data.

Answer:

  • Over the 30 days, there is an increasing trend on both paperback and hardcover books.

  • Due to the dramatic fluctuation in the daily sales, it is hard to see any seasonality or cyclic factors for both type of books.

1.2.3 Part c

Compute the RMSE values for the training data in each case.

Answer:

  • The RMSE for the ses Paperback books training data is 33.6377.

  • The RMSE for the ses Hardcover books training data is 31.9310.

## [1] "The RMSE for the `ses` Paperback books training data is 33.637686782912"
## [1] "The RMSE for the `ses` Hardcover books training data is 31.9310149844547"

1.3 Ex. 7.6

(a.) Now apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.

(b.) Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.

(c.) Compare the forecasts for the two series using both methods. Which do you think is best?

(d.) Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.

1.3.2 Part b

Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.

Answer:

  • The RMSE measures of Holt’s method for the two series are smaller than those of simple exponential smoothing method, which are better.

  • Holt’s linear method using one more parameter than SES, \(b_{t}\), an estimate of the trend of the series at time \(t\). This allows us to forecast data with trend, here in this data as an increasing trend.

  • Therefore, Holt’s linear method performs better than simple exponential smoothing method.

## [1] "The RMSE for the SES Paperback and Hardcover training data is 33.637686782912 and 31.9310149844547"
## [1] "The RMSE for the Holt's Paperback and Hardcover training data is 31.1369230162347 and 27.1935779818511"

1.3.3 Part c

Compare the forecasts for the two series using both methods. Which do you think is best?

Answer:

  • From the graph below, it is clear that Holt’s linear method allows us to forecast the trend using the same dataset compared to SES. Holt’s linear method also has smaller RMSE compared to SES.

  • Therefore, Holt’s linear method is better than SES.

1.3.4 Part d

Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.

Answer:

  • By using RMSE values and assuming normal errors, we have the 95% prediction intervals:
95% Prediction Interval Lower 95 Upper 95
SES-Paperback 141.1798 273.0395
SES-Hardcover 176.9753 302.1449
Holt-Paperback 148.4384 270.4951
Holt-Hardcover 196.8745 303.4733
  • By using \(R\), we have the 95% prediction intervals:
95% Prediction Interval Lower 95 Upper 95
SES-Paperback 138.8670 275.3523
SES-Hardcover 174.7799 304.3403
Holt-Paperback 143.9130 275.0205
Holt-Hardcover 192.9222 307.4256
## [1] "95% PI of SES Paperback: ( 141.179798900711 , 273.039531089726 )"
## [1] "95% PI of SES Hardcover: ( 176.97530263223 , 302.144881371293 )"
## [1] "95% PI of Holt's Paperback: ( 148.438396409231 , 270.495134632871 )"
## [1] "95% PI of Holt's Hardcover: ( 196.874459367927 , 303.473285056784 )"
## [1] "95% PI of SES Paperback: ( 138.867024106993 , 275.352305883445 )"
## [1] "95% PI of SES Hardcover: ( 174.779870851487 , 304.340313152036 )"
## [1] "95% PI of Holt's Paperback: ( 143.912985820256 , 275.020545221845 )"
## [1] "95% PI of Holt's Hardcover: ( 192.922170771941 , 307.42557365277 )"

1.4 Ex. 7.7

For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900–1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.

[Hint: use h=100 when calling holt() so you can clearly see the differences between the various options when plotting the forecasts.]

Which model gives the best RMSE?

1.4.1 Part a

For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900–1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.

Answer:

  • The eggs dataset has an overall decreasing trend. There are no obvious seasonality or cyclic factors.

  • Using holt() function, I will forecast with (1) Holt’s linear method, (2) Holt’s with damped trend, (3) Holt’s with BoxCox transformation, and (4) Holt’s with Boxcox transformation plus damped trend. They are all plotted on the same forecasting graph below.

1.4.2 Part b

Which model gives the best RMSE?

Answer:

  • The RMSE of all methods are:
Method RMSE
Holt_boxcox 26.3938
Holt_boxcox_damped 26.5332
Holt_damped 26.5402
Holt 26.5822
  • The Holt’s linear method with boxcox transformation performs the best. It has the lowest RMSE 26.3938.
## [1] "RMSE of Holt's:  26.5821881569903"
## [1] "RMSE of damped Holt's:  26.5401852798325"
## [1] "RMSE of Boxcox Holt's:  26.3937555344296"
## [1] "RMSE of Boxcox damped Holt's:  26.5332106346358"

1.5 Ex. 7.8

Recall your retail time series data (from Exercise 3 in Section 2.10).

(a.) Why is multiplicative seasonality necessary for this series?

(b.) Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

(c.) Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

(d.) Check that the residuals from the best method look like white noise.

(e.) Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naive approach from Exercise 8 in Section 3.7?

1.5.1 Part a

Why is multiplicative seasonality necessary for this series?

Answer:

  • From our textbook Sec 7.3, the multiplicative method is preferred when the seasonality variations are changing proportional to the level of the series. With the multiplicative method, the seasonal component is expressed in relative terms (percentages) and the series is seasonally adjusted by dividing through by the seasonal component.

  • It is necessary to use multiplicative method here because the seasonality in this time series changes gradually and increasing proportionally over time.

1.5.3 Part c

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

Answer:

  • The RMSE of the one-step forecasts from the two methods are:
Method RMSE
Holt-Winter’s multiplicative 31.6938
Holt-Winter’s multiplicative damped 31.5360
  • I prefer the Holt-Winter’s multiplicative damped method as it has a smaller RMSE.
## [1] "RMSE of Holt-Winter's multiplicative method:  31.6937872890862"
## [1] "RMSE of Holt-Winter's multiplicative damped method:  31.5359595784103"

1.5.4 Part d

Check that the residuals from the best method look like white noise.

Answer:

Let’s study the residuals from the Holt-Winter’s multiplicative damped method as it is the better model from part c.

  • There are couple up peaks and low peaks in the residual time graph between 1995 and 2001.

  • The ACF plot shows some significant correlations between the lags of residuals.

  • The mean of the residuals are very close to 0 from the nearly normal histogram.

  • The p-value from the Ljung-Box test is nearly 0.

\(\therefore\) Although the residuals shows some correlations, they do not look like white noise

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt-Winters' multiplicative method
## Q* = 341.35, df = 7, p-value < 2.2e-16
## 
## Model df: 17.   Total lags used: 24

1.5.5 Part e

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naive approach from Exercise 8 in Section 3.7?

Answer:

  • The RMSE on the 3 methods are:
RMSE on each method Training Set Test Set
Holt-Winter’s multiplicative 28.76835 257.16472
Holt-Winter’s multiplicative damped 28.82823 320.82257
Seasonal Naive 83.42504 381.15950
  • The best method here is the Holt-Winter’s multiplicative method with the lowest RMSE values.
## [1] "RMSE of Holt-Winter's multiplicative method: "
## Training set     Test set 
##     28.76835    257.16472
## [1] "RMSE of Holt-Winter's multiplicative damped method: "
## Training set     Test set 
##     28.82823    320.82257
## [1] "RMSE of seasonal naive method: "
## Training set     Test set 
##     83.42504    381.15950

1.6 Ex. 7.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

1.6.1 Answer

Answer:

I use stlf() function to perform the work. In textbook Sec 6.8, it introduced the stlf() function in the last paragraphs:

stlf() function is a short-cut approach to decompose the time series using STL, forecast the seasonally adjusted series, and return the reasonalised forecasts. The stlf() function uses mstl() to carry out the decomposition, so there are default values for s.window and t.window. If method is not specified, it will use the ETS approach applied to the seasonally adjusted series.” “Use the lambda argument if you think a Box-Cox transformation is required.”

  • From the graph, the STLF forecast is closer to our test set than the Holt-Winter’s multiplicative method.

  • The RMSE value of stlf() function is the lowest compared to the ones we got in Ex 7.8.

RMSE on each method Training Set Test Set
STLF 25.37597 125.22173
Holt-Winter’s multiplicative 28.76835 257.16472
Holt-Winter’s multiplicative damped 28.82823 320.82257
Seasonal Naive 83.42504 381.15950

Therefore, STL decomposition applied to the Box-Cox transformed series followed by ETS on the seasonally adjust data performs the best among all methods.

## [1] "RMSE of STLF method: "
## Training set     Test set 
##     25.37597    125.22173