\label{fig:fig1}Forecasting: Principles and Practice.

Forecasting: Principles and Practice.

Instructions

From the book Forecasting Principles and Practice by Hyndman, R. & Athanasopoulus, G.

Please submit exercises 7.1, 7.5,7.6, 7.7, 7.8 and 7.9 from the Hyndman online Forecasting book. Please submit both your Rpubs link as well as attach the .rmd file with your code.

Exercises

7.1

Consider the pigs series — the number of pigs slaughtered in Victoria each month.

library(fpp2)

First, we need to load the library fpp2(). This will automatically load all the data used in the book.

library(fpp2)

a)

Use the ses() function in R to find the optimal values of \(\alpha\) and \(l_0\), and generate forecasts for the next four months.

Before, we continue, I would like to have an idea of how this data set looks like. Please note that this data set describe the monthly total number of pigs slaughtered in Victoria, Australia (Jan 1980 – Aug 1995).

Now, let’s have a visualization of the regular decomposition of the data.

Now, we can proceed to use the ses() function in order to estimate the optimal parameters values of \(\alpha\) and \(l_0\).

In order to do so, I will employ h = 4, since the frequency of the data set is 12; which represents monthly data.

fc <- ses(pigs, h=4)

From the above, we can determine that our optimal parameter values for the Simple Exponential Smothing model are \(\alpha\) = 0.2971 and \(l_0\) = 77260.06.

## Simple exponential smoothing 
## 
## Call:
##  ses(y = pigs, h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.2971 
## 
##   Initial states:
##     l = 77260.0561 
## 
##   sigma:  10308.58
## 
##      AIC     AICc      BIC 
## 4462.955 4463.086 4472.665

The below table, will represent the forecast for the next four months.

Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Sep 1995 98816.41 85605.43 112027.4 78611.97 119020.8
Oct 1995 98816.41 85034.52 112598.3 77738.83 119894.0
Nov 1995 98816.41 84486.34 113146.5 76900.46 120732.4
Dec 1995 98816.41 83958.37 113674.4 76092.99 121539.8

The below graph will represent a visualization of the data with the fitted values.

b)

Compute a 95% prediction interval for the first forecast using \(\widehat{y} \pm 1.96s\) where \(s\) is the standard deviation of the residuals. Compare your interval with the interval produced by R.

Let’s estimate the standard deviation from the residuals with some visualizations.

## 
##  Ljung-Box test
## 
## data:  Residuals from Simple exponential smoothing
## Q* = 55.356, df = 22, p-value = 0.0001057
## 
## Model df: 2.   Total lags used: 24

From the above, the standard deviation of the residuals is: \(s\) = 10273.69.

Let’s take a look at the following table:

Point.Forecast Lo.95 Hi.95
Sep 1995 Predicted 98816.41 78611.9685 119020.8438
Sep 1995 Manual 98816.41 78679.9673 118952.8450
Sep 1995 Difference 0.00 -67.9988 67.9988

And by comparing, we notice how the values differ, however is very interesting to note that the difference in between the Lo.95 and Hi.95 are very similar but with opposite signs.

7.5

Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.

a)

Plot the series and discuss the main features of the data.

First, let’s have a look at the following table.

Now, let’s visualize it.

From the above graph, we can identify two different series; one for Paperback and one for Hardcover. There seems to be some seasonality that is in opposite trends, that is, when one trend goes up, the other group’s trend seems to go down, over all the trend seems to be increasing for the whole period of time.

b)

Use the ses() function to forecast each series, and plot the forecasts.

In order to do so, I will employ h = 4, since the frequency of the data set is 1; which represents yearly data in a daily fashion.

fc.p <- ses(books[,1], h = 4) # Paperback
fc.h <- ses(books[,2], h = 4) # Hardcover

The below table, will represent the Paper forecast for the next four days.

Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
31 207.1097 162.4882 251.7311 138.8670 275.3523
32 207.1097 161.8589 252.3604 137.9046 276.3147
33 207.1097 161.2382 252.9811 136.9554 277.2639
34 207.1097 160.6259 253.5935 136.0188 278.2005

The below table, will represent the Hardcover forecast for the next four days.

Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
31 239.5601 197.2026 281.9176 174.7799 304.3403
32 239.5601 194.9788 284.1414 171.3788 307.7414
33 239.5601 192.8607 286.2595 168.1396 310.9806
34 239.5601 190.8347 288.2855 165.0410 314.0792

Let’s have a visualization of the forecasts.

c)

Compute the RMSE values for the training data in each case.

Since the RMSE stands for Root-Mean-Square-Error; and the formula is given by:

\[RMSE = \sqrt{\frac{1}{N} \sum_{i=1}^{N}{(y_i - \widehat{y_i})^2}}\] We have as follows:

RMSE.p <- sqrt( sum( residuals(fc.p)^2 ) / length(books[,1]) ) # Paperback
RMSE.h <- sqrt( sum( residuals(fc.h)^2 ) / length(books[,2]) ) # Hardcover

From above, we conclude as follows:

From above, we obtain the following RMSE values.

Paperback Hardcover
SES RMSE 33.63769 31.93101

Values returned from the Paperback model.

ME RMSE MAE MPE MAPE MASE ACF1
Training set 7.175981 33.63769 27.8431 0.4736071 15.57784 0.7021303 -0.2117522

Values returned from the Hardcover model.

ME RMSE MAE MPE MAPE MASE ACF1
Training set 9.166735 31.93101 26.77319 2.636189 13.39487 0.7987887 -0.1417763

And, as you might notice, the manually calculated values match the model returned values.

7.6

a)

Now apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.

The below table, will represent the Paper forecast for the next four days.

Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
31 209.4668 166.6035 252.3301 143.9130 275.0205
32 210.7177 167.8544 253.5811 145.1640 276.2715
33 211.9687 169.1054 254.8320 146.4149 277.5225
34 213.2197 170.3564 256.0830 147.6659 278.7735

The below table, will represent the Hardcover forecast for the next four days.

Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
31 250.1739 212.7390 287.6087 192.9222 307.4256
32 253.4765 216.0416 290.9113 196.2248 310.7282
33 256.7791 219.3442 294.2140 199.5274 314.0308
34 260.0817 222.6468 297.5166 202.8300 317.3334

Let’s have a visualization of the forecasts.

b)

Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.

First, let’s compare the respective RMSEs. From above, the previous Simple Exponential Smoothing model, we obtained the following RMSE values.

Paperback Hardcover
SES RMSE 33.63769 31.93101

Holt’s values returned from the Paperback model.

ME RMSE MAE MPE MAPE MASE ACF1
Training set -3.717178 31.13692 26.18083 -5.508526 15.58354 0.6602122 -0.1750792

Holt’s values returned from the Hardcover model.

ME RMSE MAE MPE MAPE MASE ACF1
Training set -0.1357882 27.19358 23.15557 -2.114792 12.1626 0.6908555 -0.0324519

By comparing, the Simple Exponential Smoothing and Holt’s RMSE, we can notice a difference in between them; that is, Holt’s model shows a lower RMSE value compared to the Simple Exponential Smoothing model.

c)

Compare the forecasts for the two series using both methods. Which do you think is best?

First, let’s perform a subtraction of the values in the following fashion:

[Forecast Simple Exponential Smoothing] - [Forecast Holt’s method]

Let’s check the Paperback first.

Point.Forecast Lo.80 Hi.80 Lo.95 Hi.95
31 -2.357101 -4.115252 -0.5989487 -5.045962 0.3317607
32 -3.608075 -5.995520 -1.2206295 -7.259358 0.0432077
33 -4.859049 -7.867155 -1.8509436 -9.459550 -0.2585487
34 -6.110024 -9.730502 -2.4895458 -11.647067 -0.5729806

Let’s check the Hardcover now.

Point.Forecast Lo.80 Hi.80 Lo.95 Hi.95
31 -10.61378 -15.53641 -5.691146 -18.14230 -3.085260
32 -13.91638 -21.06284 -6.769925 -24.84594 -2.986819
33 -17.21898 -26.48348 -7.954483 -31.38781 -3.050154
34 -20.52158 -31.81214 -9.231025 -37.78900 -3.254166

If we compare the two forecasts, by comparing the RMSE values, definitely, the Holt’s linear method seems to predict the behavior of the trends, in a better fashion compared to the Simple Exponential Smoothing method.

Also, by looking at the above differences, something interesting to note is that the values forecast by the Holt’s method are bigger and mostly wider intervals compared to the Simple Exponential Smoothing method; thus, definitely helps to capture a better confidence interval in the forecast.

d)

Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.

First, let’s estimate the standard deviation from the Paperback residuals with some visualizations.

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt's method
## Q* = 15.081, df = 3, p-value = 0.001749
## 
## Model df: 4.   Total lags used: 7

From the above, the standard deviation of the residuals is: \(s\) = 31.44.

Now, let’s estimate the standard deviation from the Hardcover residuals with some visualizations.

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt's method
## Q* = 9.416, df = 3, p-value = 0.02424
## 
## Model df: 4.   Total lags used: 7

From the above, the standard deviation of the residuals is: \(s\) = 27.66.

Let’s take a look at the following Paperback table:

Point.Forecast Lo.95 Hi.95
31 Holt’s Paperback Predicted 209.4668 143.912986 275.020545
31 Holt’s Paperback Manual 209.4668 147.839010 271.094521
31 Holt’s Paperback Difference 0.0000 -3.926025 3.926025

Let’s take a look at the following Hardcover table:

Point.Forecast Lo.95 Hi.95
31 Holt’s Hardcover Predicted 250.1739 192.922171 307.425574
31 Holt’s Hardcover Manual 250.1739 195.963968 304.383776
31 Holt’s Hardcover Difference 0.0000 -3.041797 3.041797

Previously, we calculated the values for SES; for simplicity, let’s recap the tables:

Here is the SES Paperback.

Point.Forecast Lo.95 Hi.95
31 SES’ Paperback Predicted 207.1097 138.867024 275.352306
31 SES’ Paperback Manual 207.1097 141.596372 272.622958
31 SES’ Paperback Difference 0.0000 -2.729348 2.729348

And here is the SES’ Hardcover table.

Point.Forecast Lo.95 Hi.95
31 SES’ Paperback Predicted 239.5601 174.779871 304.340313
31 SES’ Paperback Manual 239.5601 178.584829 300.535355
31 SES’ Paperback Difference 0.0000 -3.804958 3.804958

7.7

For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900–1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.

[Hint: use h=100 when calling holt() so you can clearly see the differences between the various options when plotting the forecasts.]

First, let’s have a view of the data:

Now, let’s have a visualization of it.

Under normal circumstances, I would use an h = 15; that is, to forecast 15 years from the data, since it has a good amount of years recorded; but based on the recommendation, let’s use h = 100 instead.

In order to compare, I have calculated a possible Box Cox lambda value of 0.3956183, since this value is near a cubic root, I will pick the cubic root as my eggs transformation.

eggs.BC <- eggs^(1/3)

Now, let’s compare the forecasting performance of the exponential smoothing methods.

fc1 <- holt(eggs, h=100)
fc2 <- holt(eggs, damped=TRUE, h=100)
fc3 <- holt(eggs.BC, h=100)
fc4 <- holt(eggs.BC, damped=TRUE, h=100)

Regular predictions.

Transformed predictions.

Something interesting about this behavior is that the Holt’s forecast method is predicting a Price of dozen eggs to be below $0; while the Holt’s damped method tend to keep it flat. Same analysis can be derived from the transformation as well.

Now, let’s check some results from these time series cross-validation to compare the one-step forecast accuracy of the four methods.

Holt HoltDamped BCHolt BCHoltDamped
Compare MSE 821.37428 847.71684 0.0709629 0.0720348
Compare MAE 20.95148 21.67774 0.2019644 0.2037994

Something interesting is that from the above table, there’s a suggestion that the Holt’s method seems to perform “better” that the damped method; the problem with that is the the Holt’s method is predicting prices under $0.

Which model gives the best RMSE?

ME RMSE MAE MPE MAPE MASE ACF1
Holt’s 0.0449909 26.5821882 19.1849063 -1.1422014 9.653791 0.9463626 0.0134820
Holt’s Damped -2.8914955 26.5401853 19.2794985 -2.9076331 10.018944 0.9510287 -0.0031954
Holt’s Cubic Root -0.0000765 0.2474836 0.1837553 -0.1700493 3.209218 0.9386606 0.0202290
Holt’s Cubic Root Damped -0.0158566 0.2491243 0.1875869 -0.4664749 3.290190 0.9582332 0.0115848

In it very interesting to note that the model returning the best RMSE with the non-transformed data is the Holt’s Damped method; but the interesting part is that the Cubic Root Transformation returned a Holt’s method as the one with the best RMSE. This is very interesting to me since I was expecting about the same kind of behavior.

7.8

Recall your retail time series data (from Exercise 3 in Section 2.10).

a)

Why is multiplicative seasonality necessary for this series?

Let’s recall that series:

read_excel(retail.xlsx)

My previously selected time series was A3349627V; and it represents the Turnover in New South Wales about Liquor retailing.

seas(myts, x11="")

autoplot(myts)

Now, from all the above visualization, we can conclude as follows:

Preamble: The additive method is preferred when the seasonal variations are roughly constant through the series. The multiplicative method is preferred when the seasonal variations are changing proportional to the level of the series. With the multiplicative method, the seasonal component is expressed in relative terms (percentages), and the series is seasonally adjusted by dividing through by the seasonal component.

Now, if we take a lot of detail attention to the above graphs, we can identify that the above description only is satisfied for the multiplicative method, that is since the seasonal variation is not considered roughly constant through the series.

b)

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

c)

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

ME RMSE MAE MPE MAPE MASE ACF1
HW multiplicative forecasts 0.3076124 5.916806 4.201477 0.0724387 3.779909 0.4485563 -0.0312262
HW multiplicative forecasts Damped 0.3511596 5.628031 4.018041 0.1776603 3.685373 0.4289724 -0.0551768

From the above table, seems that the HW multiplicative forecasts Damped has better results.

e)

Check that the residuals from the best method look like white noise.

For white noise series, we expect each auto correlation to be close to zero. Of course, they will not be exactly equal to zero as there is some random variation. For a white noise series, we expect 95% of the spikes in the ACF to lie within \(\pm \frac{2}{\sqrt{T}}\) where \(T\) is the length of the time series.

Let’s calculate this number:

From the calculations; it is noted that about 2.6 % of values lie outside the bound of \(\pm\) 0.1. Hence, even though they look like white noise is not considered white noise based on the definition of white noise.

e)

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naĂŻve approach from Exercise 8 in Section 3.7?

ME RMSE MAE MPE MAPE MASE ACF1
HW multiplicative forecasts 0.3301342 5.275059 3.825787 0.3442519 3.900028 0.4367368 0.0284223
HW multiplicative forecasts Damped 0.2243612 5.362182 3.837296 0.0562903 3.867764 0.4380506 0.0075028

Now, lets find the values from the seasonal naĂŻve approach from Exercise 8 in Section 3.7.

Here are the results for the RMSE accuracy results:

HW multiplicative forecasts

ME RMSE MAE MPE MAPE MASE ACF1 Theil.s.U
Training set 0.3301342 5.275059 3.825787 0.3442519 3.900028 0.4367368 0.0284223 NA
Test set 9.1057020 13.606928 10.885870 3.3900549 4.044825 1.2426880 0.5280136 0.3129365

HW multiplicative forecasts Damped

ME RMSE MAE MPE MAPE MASE ACF1 Theil.s.U
Training set 0.2243612 5.362182 3.837296 0.0562903 3.867764 0.4380506 0.0075028 NA
Test set 21.9487125 25.434378 22.070974 8.1823930 8.243831 2.5195354 0.6205859 0.5795209

Seasonal naĂŻve approach

ME RMSE MAE MPE MAPE MASE ACF1 Theil.s.U
Training set 0.2243612 5.362182 3.837296 0.0562903 3.867764 0.4380506 0.0075028 NA
Test set 21.9487125 25.434378 22.070974 8.1823930 8.243831 2.5195354 0.6205859 0.5795209

From the above tables, we can conclude that in effect, we can beat the Seasonal naĂŻve approach with the HW multiplicative forecasts approach.

7.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

In order to compare, I have calculated a possible Box Cox lambda value of -0.1452085, since this value is near 0, I will pick the log as my time series transformation.

myts.BC <- log(myts)
myts.train.BC <- log(myts.train)

Let’s visualize the components for the stl decomposition.

Let’s take a look at the summary results given by ets.

## ETS(M,A,M) 
## 
## Call:
##  ets(y = myts.train.BC) 
## 
##   Smoothing parameters:
##     alpha = 0.4425 
##     beta  = 1e-04 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 3.7553 
##     b = 0.0047 
##     s = 0.9971 0.9778 1.0022 1.0881 1.0152 1.0042
##            0.9891 0.9857 0.9805 0.9787 0.9891 0.9925
## 
##   sigma:  0.0112
## 
##       AIC      AICc       BIC 
## -32.28453 -30.34782  32.50487 
## 
## Training set error measures:
##                        ME       RMSE        MAE          MPE      MAPE
## Training set 0.0003987882 0.04879372 0.03639953 -0.001680687 0.8083399
##                   MASE       ACF1
## Training set 0.4380407 0.04535694

Now, let’s visualize the components.

From the above graphs, I don’t identify too much of a difference on the Box Cox transformation in between STL and ETS forecasting methods. Now, comparing the graphs to the original data, it seems to follow the patterns in a really good way; it follows the season and the trend in a really good approximation.

References

Chi Yau. 2013. R Tutorial with Bayesian Statistics Using Openbugs. USA: R-Tutor.com. http://www.r-tutor.com.

Kuhn, M. & Johnson, K. 2018. Applied Predictive Modeling. USA: Pfizer Global R&D. http://appliedpredictivemodeling.com/.

R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.