Forecasting: Principles and Practice.
Instructions
From the book Forecasting Principles and Practice by Hyndman, R. & Athanasopoulus, G.
Please submit exercises 7.1, 7.5,7.6, 7.7, 7.8
and 7.9
from the Hyndman online Forecasting book. Please submit both your Rpubs link as well as attach the .rmd file with your code.
Exercises
7.1
Consider the pigs
series — the number of pigs slaughtered in Victoria each month.
library(fpp2)
First, we need to load the library fpp2()
. This will automatically load all the data used in the book.
library(fpp2)
a)
Use the ses()
function in R to find the optimal values of \(\alpha\) and \(l_0\), and generate forecasts for the next four months.
Before, we continue, I would like to have an idea of how this data set looks like. Please note that this data set describe the monthly total number of pigs slaughtered in Victoria, Australia (Jan 1980 – Aug 1995).
Now, let’s have a visualization of the regular decomposition of the data.
Now, we can proceed to use the ses()
function in order to estimate the optimal parameters values of \(\alpha\) and \(l_0\).
In order to do so, I will employ h = 4, since the frequency of the data set is 12; which represents monthly data.
fc <- ses(pigs, h=4)
From the above, we can determine that our optimal parameter values for the Simple Exponential Smothing model are \(\alpha\) = 0.2971 and \(l_0\) = 77260.06.
## Simple exponential smoothing
##
## Call:
## ses(y = pigs, h = 4)
##
## Smoothing parameters:
## alpha = 0.2971
##
## Initial states:
## l = 77260.0561
##
## sigma: 10308.58
##
## AIC AICc BIC
## 4462.955 4463.086 4472.665
The below table, will represent the forecast for the next four months.
Point Forecast | Lo 80 | Hi 80 | Lo 95 | Hi 95 | |
---|---|---|---|---|---|
Sep 1995 | 98816.41 | 85605.43 | 112027.4 | 78611.97 | 119020.8 |
Oct 1995 | 98816.41 | 85034.52 | 112598.3 | 77738.83 | 119894.0 |
Nov 1995 | 98816.41 | 84486.34 | 113146.5 | 76900.46 | 120732.4 |
Dec 1995 | 98816.41 | 83958.37 | 113674.4 | 76092.99 | 121539.8 |
The below graph will represent a visualization of the data with the fitted values.
b)
Compute a 95%
prediction interval for the first forecast using \(\widehat{y} \pm 1.96s\) where \(s\) is the standard deviation of the residuals. Compare your interval with the interval produced by R.
Let’s estimate the standard deviation from the residuals with some visualizations.
##
## Ljung-Box test
##
## data: Residuals from Simple exponential smoothing
## Q* = 55.356, df = 22, p-value = 0.0001057
##
## Model df: 2. Total lags used: 24
From the above, the standard deviation of the residuals is: \(s\) = 10273.69.
Let’s take a look at the following table:
Point.Forecast | Lo.95 | Hi.95 | |
---|---|---|---|
Sep 1995 Predicted | 98816.41 | 78611.9685 | 119020.8438 |
Sep 1995 Manual | 98816.41 | 78679.9673 | 118952.8450 |
Sep 1995 Difference | 0.00 | -67.9988 | 67.9988 |
And by comparing, we notice how the values differ, however is very interesting to note that the difference in between the Lo.95 and Hi.95 are very similar but with opposite signs.
7.5
Data set books
contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.
a)
Plot the series and discuss the main features of the data.
First, let’s have a look at the following table.
Now, let’s visualize it.
From the above graph, we can identify two different series; one for Paperback and one for Hardcover. There seems to be some seasonality that is in opposite trends, that is, when one trend goes up, the other group’s trend seems to go down, over all the trend seems to be increasing for the whole period of time.
b)
Use the ses()
function to forecast each series, and plot the forecasts.
In order to do so, I will employ h = 4, since the frequency of the data set is 1; which represents yearly data in a daily fashion.
fc.p <- ses(books[,1], h = 4) # Paperback
fc.h <- ses(books[,2], h = 4) # Hardcover
The below table, will represent the Paper forecast for the next four days.
Point Forecast | Lo 80 | Hi 80 | Lo 95 | Hi 95 | |
---|---|---|---|---|---|
31 | 207.1097 | 162.4882 | 251.7311 | 138.8670 | 275.3523 |
32 | 207.1097 | 161.8589 | 252.3604 | 137.9046 | 276.3147 |
33 | 207.1097 | 161.2382 | 252.9811 | 136.9554 | 277.2639 |
34 | 207.1097 | 160.6259 | 253.5935 | 136.0188 | 278.2005 |
The below table, will represent the Hardcover forecast for the next four days.
Point Forecast | Lo 80 | Hi 80 | Lo 95 | Hi 95 | |
---|---|---|---|---|---|
31 | 239.5601 | 197.2026 | 281.9176 | 174.7799 | 304.3403 |
32 | 239.5601 | 194.9788 | 284.1414 | 171.3788 | 307.7414 |
33 | 239.5601 | 192.8607 | 286.2595 | 168.1396 | 310.9806 |
34 | 239.5601 | 190.8347 | 288.2855 | 165.0410 | 314.0792 |
Let’s have a visualization of the forecasts.
c)
Compute the RMSE values for the training data in each case.
Since the RMSE stands for Root-Mean-Square-Error; and the formula is given by:
\[RMSE = \sqrt{\frac{1}{N} \sum_{i=1}^{N}{(y_i - \widehat{y_i})^2}}\] We have as follows:
RMSE.p <- sqrt( sum( residuals(fc.p)^2 ) / length(books[,1]) ) # Paperback
RMSE.h <- sqrt( sum( residuals(fc.h)^2 ) / length(books[,2]) ) # Hardcover
From above, we conclude as follows:
From above, we obtain the following RMSE values.
Paperback | Hardcover | |
---|---|---|
SES RMSE | 33.63769 | 31.93101 |
Values returned from the Paperback model.
ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | |
---|---|---|---|---|---|---|---|
Training set | 7.175981 | 33.63769 | 27.8431 | 0.4736071 | 15.57784 | 0.7021303 | -0.2117522 |
Values returned from the Hardcover model.
ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | |
---|---|---|---|---|---|---|---|
Training set | 9.166735 | 31.93101 | 26.77319 | 2.636189 | 13.39487 | 0.7987887 | -0.1417763 |
And, as you might notice, the manually calculated values match the model returned values.
7.6
a)
Now apply Holt’s linear method to the paperback
and hardback
series and compute four-day forecasts in each case.
The below table, will represent the Paper forecast for the next four days.
Point Forecast | Lo 80 | Hi 80 | Lo 95 | Hi 95 | |
---|---|---|---|---|---|
31 | 209.4668 | 166.6035 | 252.3301 | 143.9130 | 275.0205 |
32 | 210.7177 | 167.8544 | 253.5811 | 145.1640 | 276.2715 |
33 | 211.9687 | 169.1054 | 254.8320 | 146.4149 | 277.5225 |
34 | 213.2197 | 170.3564 | 256.0830 | 147.6659 | 278.7735 |
The below table, will represent the Hardcover forecast for the next four days.
Point Forecast | Lo 80 | Hi 80 | Lo 95 | Hi 95 | |
---|---|---|---|---|---|
31 | 250.1739 | 212.7390 | 287.6087 | 192.9222 | 307.4256 |
32 | 253.4765 | 216.0416 | 290.9113 | 196.2248 | 310.7282 |
33 | 256.7791 | 219.3442 | 294.2140 | 199.5274 | 314.0308 |
34 | 260.0817 | 222.6468 | 297.5166 | 202.8300 | 317.3334 |
Let’s have a visualization of the forecasts.
b)
Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.
First, let’s compare the respective RMSEs. From above, the previous Simple Exponential Smoothing model, we obtained the following RMSE values.
Paperback | Hardcover | |
---|---|---|
SES RMSE | 33.63769 | 31.93101 |
Holt’s values returned from the Paperback model.
ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | |
---|---|---|---|---|---|---|---|
Training set | -3.717178 | 31.13692 | 26.18083 | -5.508526 | 15.58354 | 0.6602122 | -0.1750792 |
Holt’s values returned from the Hardcover model.
ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | |
---|---|---|---|---|---|---|---|
Training set | -0.1357882 | 27.19358 | 23.15557 | -2.114792 | 12.1626 | 0.6908555 | -0.0324519 |
By comparing, the Simple Exponential Smoothing and Holt’s RMSE, we can notice a difference in between them; that is, Holt’s model shows a lower RMSE value compared to the Simple Exponential Smoothing model.
c)
Compare the forecasts for the two series using both methods. Which do you think is best?
First, let’s perform a subtraction of the values in the following fashion:
[Forecast Simple Exponential Smoothing] - [Forecast Holt’s method]
Let’s check the Paperback first.
Point.Forecast | Lo.80 | Hi.80 | Lo.95 | Hi.95 | |
---|---|---|---|---|---|
31 | -2.357101 | -4.115252 | -0.5989487 | -5.045962 | 0.3317607 |
32 | -3.608075 | -5.995520 | -1.2206295 | -7.259358 | 0.0432077 |
33 | -4.859049 | -7.867155 | -1.8509436 | -9.459550 | -0.2585487 |
34 | -6.110024 | -9.730502 | -2.4895458 | -11.647067 | -0.5729806 |
Let’s check the Hardcover now.
Point.Forecast | Lo.80 | Hi.80 | Lo.95 | Hi.95 | |
---|---|---|---|---|---|
31 | -10.61378 | -15.53641 | -5.691146 | -18.14230 | -3.085260 |
32 | -13.91638 | -21.06284 | -6.769925 | -24.84594 | -2.986819 |
33 | -17.21898 | -26.48348 | -7.954483 | -31.38781 | -3.050154 |
34 | -20.52158 | -31.81214 | -9.231025 | -37.78900 | -3.254166 |
If we compare the two forecasts, by comparing the RMSE values, definitely, the Holt’s linear method seems to predict the behavior of the trends, in a better fashion compared to the Simple Exponential Smoothing method.
Also, by looking at the above differences, something interesting to note is that the values forecast by the Holt’s method are bigger and mostly wider intervals compared to the Simple Exponential Smoothing method; thus, definitely helps to capture a better confidence interval in the forecast.
d)
Calculate a 95%
prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.
First, let’s estimate the standard deviation from the Paperback residuals with some visualizations.
##
## Ljung-Box test
##
## data: Residuals from Holt's method
## Q* = 15.081, df = 3, p-value = 0.001749
##
## Model df: 4. Total lags used: 7
From the above, the standard deviation of the residuals is: \(s\) = 31.44.
Now, let’s estimate the standard deviation from the Hardcover residuals with some visualizations.
##
## Ljung-Box test
##
## data: Residuals from Holt's method
## Q* = 9.416, df = 3, p-value = 0.02424
##
## Model df: 4. Total lags used: 7
From the above, the standard deviation of the residuals is: \(s\) = 27.66.
Let’s take a look at the following Paperback table:
Point.Forecast | Lo.95 | Hi.95 | |
---|---|---|---|
31 Holt’s Paperback Predicted | 209.4668 | 143.912986 | 275.020545 |
31 Holt’s Paperback Manual | 209.4668 | 147.839010 | 271.094521 |
31 Holt’s Paperback Difference | 0.0000 | -3.926025 | 3.926025 |
Let’s take a look at the following Hardcover table:
Point.Forecast | Lo.95 | Hi.95 | |
---|---|---|---|
31 Holt’s Hardcover Predicted | 250.1739 | 192.922171 | 307.425574 |
31 Holt’s Hardcover Manual | 250.1739 | 195.963968 | 304.383776 |
31 Holt’s Hardcover Difference | 0.0000 | -3.041797 | 3.041797 |
Previously, we calculated the values for SES; for simplicity, let’s recap the tables:
Here is the SES Paperback.
Point.Forecast | Lo.95 | Hi.95 | |
---|---|---|---|
31 SES’ Paperback Predicted | 207.1097 | 138.867024 | 275.352306 |
31 SES’ Paperback Manual | 207.1097 | 141.596372 | 272.622958 |
31 SES’ Paperback Difference | 0.0000 | -2.729348 | 2.729348 |
And here is the SES’ Hardcover table.
Point.Forecast | Lo.95 | Hi.95 | |
---|---|---|---|
31 SES’ Paperback Predicted | 239.5601 | 174.779871 | 304.340313 |
31 SES’ Paperback Manual | 239.5601 | 178.584829 | 300.535355 |
31 SES’ Paperback Difference | 0.0000 | -3.804958 | 3.804958 |
7.7
For this exercise use data set eggs
, the price of a dozen eggs in the United States from 1900–1993. Experiment with the various options in the holt()
function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.
[Hint: use h=100 when calling holt()
so you can clearly see the differences between the various options when plotting the forecasts.]
First, let’s have a view of the data:
Now, let’s have a visualization of it.
Under normal circumstances, I would use an h = 15; that is, to forecast 15 years from the data, since it has a good amount of years recorded; but based on the recommendation, let’s use h = 100 instead.
In order to compare, I have calculated a possible Box Cox lambda value of 0.3956183, since this value is near a cubic root, I will pick the cubic root as my eggs transformation.
eggs.BC <- eggs^(1/3)
Now, let’s compare the forecasting performance of the exponential smoothing methods.
fc1 <- holt(eggs, h=100)
fc2 <- holt(eggs, damped=TRUE, h=100)
fc3 <- holt(eggs.BC, h=100)
fc4 <- holt(eggs.BC, damped=TRUE, h=100)
Regular predictions.
Transformed predictions.
Something interesting about this behavior is that the Holt’s forecast method is predicting a Price of dozen eggs to be below $0; while the Holt’s damped method tend to keep it flat. Same analysis can be derived from the transformation as well.
Now, let’s check some results from these time series cross-validation to compare the one-step forecast accuracy of the four methods.
Holt | HoltDamped | BCHolt | BCHoltDamped | |
---|---|---|---|---|
Compare MSE | 821.37428 | 847.71684 | 0.0709629 | 0.0720348 |
Compare MAE | 20.95148 | 21.67774 | 0.2019644 | 0.2037994 |
Something interesting is that from the above table, there’s a suggestion that the Holt’s method seems to perform “better” that the damped method; the problem with that is the the Holt’s method is predicting prices under $0.
Which model gives the best RMSE?
ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | |
---|---|---|---|---|---|---|---|
Holt’s | 0.0449909 | 26.5821882 | 19.1849063 | -1.1422014 | 9.653791 | 0.9463626 | 0.0134820 |
Holt’s Damped | -2.8914955 | 26.5401853 | 19.2794985 | -2.9076331 | 10.018944 | 0.9510287 | -0.0031954 |
Holt’s Cubic Root | -0.0000765 | 0.2474836 | 0.1837553 | -0.1700493 | 3.209218 | 0.9386606 | 0.0202290 |
Holt’s Cubic Root Damped | -0.0158566 | 0.2491243 | 0.1875869 | -0.4664749 | 3.290190 | 0.9582332 | 0.0115848 |
In it very interesting to note that the model returning the best RMSE with the non-transformed data is the Holt’s Damped method; but the interesting part is that the Cubic Root Transformation returned a Holt’s method as the one with the best RMSE. This is very interesting to me since I was expecting about the same kind of behavior.
7.8
Recall your retail time series data (from Exercise 3 in Section 2.10).
a)
Why is multiplicative seasonality necessary for this series?
Let’s recall that series:
read_excel(retail.xlsx)
My previously selected time series was A3349627V
; and it represents the Turnover in New South Wales about Liquor retailing.
seas(myts, x11="")
autoplot(myts)
Now, from all the above visualization, we can conclude as follows:
Preamble: The additive method is preferred when the seasonal variations are roughly constant through the series. The multiplicative method is preferred when the seasonal variations are changing proportional to the level of the series. With the multiplicative method, the seasonal component is expressed in relative terms (percentages), and the series is seasonally adjusted by dividing through by the seasonal component.
Now, if we take a lot of detail attention to the above graphs, we can identify that the above description only is satisfied for the multiplicative method, that is since the seasonal variation is not considered roughly constant through the series.
b)
Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.
c)
Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?
ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | |
---|---|---|---|---|---|---|---|
HW multiplicative forecasts | 0.3076124 | 5.916806 | 4.201477 | 0.0724387 | 3.779909 | 0.4485563 | -0.0312262 |
HW multiplicative forecasts Damped | 0.3511596 | 5.628031 | 4.018041 | 0.1776603 | 3.685373 | 0.4289724 | -0.0551768 |
From the above table, seems that the HW multiplicative forecasts Damped has better results.
e)
Check that the residuals from the best method look like white noise.
For white noise series, we expect each auto correlation to be close to zero. Of course, they will not be exactly equal to zero as there is some random variation. For a white noise series, we expect 95% of the spikes in the ACF to lie within \(\pm \frac{2}{\sqrt{T}}\) where \(T\) is the length of the time series.
Let’s calculate this number:
From the calculations; it is noted that about 2.6 % of values lie outside the bound of \(\pm\) 0.1. Hence, even though they look like white noise is not considered white noise based on the definition of white noise.
e)
Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naĂŻve approach from Exercise 8 in Section 3.7?
ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | |
---|---|---|---|---|---|---|---|
HW multiplicative forecasts | 0.3301342 | 5.275059 | 3.825787 | 0.3442519 | 3.900028 | 0.4367368 | 0.0284223 |
HW multiplicative forecasts Damped | 0.2243612 | 5.362182 | 3.837296 | 0.0562903 | 3.867764 | 0.4380506 | 0.0075028 |
Now, lets find the values from the seasonal naĂŻve approach from Exercise 8 in Section 3.7.
Here are the results for the RMSE accuracy results:
HW multiplicative forecasts
ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | Theil.s.U | |
---|---|---|---|---|---|---|---|---|
Training set | 0.3301342 | 5.275059 | 3.825787 | 0.3442519 | 3.900028 | 0.4367368 | 0.0284223 | NA |
Test set | 9.1057020 | 13.606928 | 10.885870 | 3.3900549 | 4.044825 | 1.2426880 | 0.5280136 | 0.3129365 |
HW multiplicative forecasts Damped
ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | Theil.s.U | |
---|---|---|---|---|---|---|---|---|
Training set | 0.2243612 | 5.362182 | 3.837296 | 0.0562903 | 3.867764 | 0.4380506 | 0.0075028 | NA |
Test set | 21.9487125 | 25.434378 | 22.070974 | 8.1823930 | 8.243831 | 2.5195354 | 0.6205859 | 0.5795209 |
Seasonal naĂŻve approach
ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | Theil.s.U | |
---|---|---|---|---|---|---|---|---|
Training set | 0.2243612 | 5.362182 | 3.837296 | 0.0562903 | 3.867764 | 0.4380506 | 0.0075028 | NA |
Test set | 21.9487125 | 25.434378 | 22.070974 | 8.1823930 | 8.243831 | 2.5195354 | 0.6205859 | 0.5795209 |
From the above tables, we can conclude that in effect, we can beat the Seasonal naĂŻve approach with the HW multiplicative forecasts approach.
7.9
For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?
In order to compare, I have calculated a possible Box Cox lambda value of -0.1452085, since this value is near 0, I will pick the log as my time series transformation.
myts.BC <- log(myts)
myts.train.BC <- log(myts.train)
Let’s visualize the components for the stl decomposition.
Let’s take a look at the summary results given by ets.
## ETS(M,A,M)
##
## Call:
## ets(y = myts.train.BC)
##
## Smoothing parameters:
## alpha = 0.4425
## beta = 1e-04
## gamma = 1e-04
##
## Initial states:
## l = 3.7553
## b = 0.0047
## s = 0.9971 0.9778 1.0022 1.0881 1.0152 1.0042
## 0.9891 0.9857 0.9805 0.9787 0.9891 0.9925
##
## sigma: 0.0112
##
## AIC AICc BIC
## -32.28453 -30.34782 32.50487
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set 0.0003987882 0.04879372 0.03639953 -0.001680687 0.8083399
## MASE ACF1
## Training set 0.4380407 0.04535694
Now, let’s visualize the components.
From the above graphs, I don’t identify too much of a difference on the Box Cox transformation in between STL and ETS forecasting methods. Now, comparing the graphs to the original data, it seems to follow the patterns in a really good way; it follows the season and the trend in a really good approximation.
References
Chi Yau. 2013. R Tutorial with Bayesian Statistics Using Openbugs. USA: R-Tutor.com. http://www.r-tutor.com.
Kuhn, M. & Johnson, K. 2018. Applied Predictive Modeling. USA: Pfizer Global R&D. http://appliedpredictivemodeling.com/.
R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.