$\label{fig:fig1}Forecasting: Principles and Practice.$

Forecasting: Principles and Practice.

Instructions

From the book Forecasting Principles and Practice by Hyndman, R. & Athanasopoulus, G.

Please submit exercises 7.1, 7.5,7.6, 7.7, 7.8 and 7.9 from the Hyndman online Forecasting book. Please submit both your Rpubs link as well as attach the .rmd file with your code.

Exercises

7.1

Consider the pigs series — the number of pigs slaughtered in Victoria each month.

`library(fpp2)`

First, we need to load the library fpp2(). This will automatically load all the data used in the book.

library(fpp2)

a)

Use the ses() function in R to find the optimal values of $\alpha$ and $l_0$, and generate forecasts for the next four months.

Before, we continue, I would like to have an idea of how this data set looks like. Please note that this data set describe the monthly total number of pigs slaughtered in Victoria, Australia (Jan 1980 – Aug 1995).

Now, let’s have a visualization of the regular decomposition of the data.

Now, we can proceed to use the ses() function in order to estimate the optimal parameters values of $\alpha$ and $l_0$.

In order to do so, I will employ h = 4, since the frequency of the data set is 12; which represents monthly data.

fc <- ses(pigs, h=4)

From the above, we can determine that our optimal parameter values for the Simple Exponential Smothing model are $\alpha$ = 0.2971 and $l_0$ = 77260.06.

## Simple exponential smoothing 
## 
## Call:
##  ses(y = pigs, h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.2971 
## 
##   Initial states:
##     l = 77260.0561 
## 
##   sigma:  10308.58
## 
##      AIC     AICc      BIC 
## 4462.955 4463.086 4472.665

The below table, will represent the forecast for the next four months.

	Point Forecast	Lo 80	Hi 80	Lo 95	Hi 95
Sep 1995	98816.41	85605.43	112027.4	78611.97	119020.8
Oct 1995	98816.41	85034.52	112598.3	77738.83	119894.0
Nov 1995	98816.41	84486.34	113146.5	76900.46	120732.4
Dec 1995	98816.41	83958.37	113674.4	76092.99	121539.8

The below graph will represent a visualization of the data with the fitted values.

b)

Compute a 95% prediction interval for the first forecast using $\widehat{y} \pm 1.96s$ where $s$ is the standard deviation of the residuals. Compare your interval with the interval produced by R.

Let’s estimate the standard deviation from the residuals with some visualizations.

## 
##  Ljung-Box test
## 
## data:  Residuals from Simple exponential smoothing
## Q* = 55.356, df = 22, p-value = 0.0001057
## 
## Model df: 2.   Total lags used: 24

From the above, the standard deviation of the residuals is: $s$ = 10273.69.

Let’s take a look at the following table:

	Point.Forecast	Lo.95	Hi.95
Sep 1995 Predicted	98816.41	78611.9685	119020.8438
Sep 1995 Manual	98816.41	78679.9673	118952.8450
Sep 1995 Difference	0.00	-67.9988	67.9988

And by comparing, we notice how the values differ, however is very interesting to note that the difference in between the Lo.95 and Hi.95 are very similar but with opposite signs.

7.5

Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.

a)

Plot the series and discuss the main features of the data.

First, let’s have a look at the following table.

Now, let’s visualize it.

From the above graph, we can identify two different series; one for Paperback and one for Hardcover. There seems to be some seasonality that is in opposite trends, that is, when one trend goes up, the other group’s trend seems to go down, over all the trend seems to be increasing for the whole period of time.

b)

Use the ses() function to forecast each series, and plot the forecasts.

In order to do so, I will employ h = 4, since the frequency of the data set is 1; which represents yearly data in a daily fashion.

fc.p <- ses(books[,1], h = 4) # Paperback
fc.h <- ses(books[,2], h = 4) # Hardcover

The below table, will represent the Paper forecast for the next four days.

	Point Forecast	Lo 80	Hi 80	Lo 95	Hi 95
31	207.1097	162.4882	251.7311	138.8670	275.3523
32	207.1097	161.8589	252.3604	137.9046	276.3147
33	207.1097	161.2382	252.9811	136.9554	277.2639
34	207.1097	160.6259	253.5935	136.0188	278.2005

The below table, will represent the Hardcover forecast for the next four days.

	Point Forecast	Lo 80	Hi 80	Lo 95	Hi 95
31	239.5601	197.2026	281.9176	174.7799	304.3403
32	239.5601	194.9788	284.1414	171.3788	307.7414
33	239.5601	192.8607	286.2595	168.1396	310.9806
34	239.5601	190.8347	288.2855	165.0410	314.0792

Let’s have a visualization of the forecasts.

c)

Compute the RMSE values for the training data in each case.

Since the RMSE stands for Root-Mean-Square-Error; and the formula is given by:

\[RMSE = \sqrt{\frac{1}{N} \sum_{i=1}^{N}{(y_i - \widehat{y_i})^2}}\] We have as follows:

RMSE.p <- sqrt( sum( residuals(fc.p)^2 ) / length(books[,1]) ) # Paperback
RMSE.h <- sqrt( sum( residuals(fc.h)^2 ) / length(books[,2]) ) # Hardcover

From above, we conclude as follows:

From above, we obtain the following RMSE values.

	Paperback	Hardcover
SES RMSE	33.63769	31.93101

Values returned from the Paperback model.

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	7.175981	33.63769	27.8431	0.4736071	15.57784	0.7021303	-0.2117522

Values returned from the Hardcover model.

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	9.166735	31.93101	26.77319	2.636189	13.39487	0.7987887	-0.1417763

And, as you might notice, the manually calculated values match the model returned values.

7.6

a)

Now apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.

The below table, will represent the Paper forecast for the next four days.

	Point Forecast	Lo 80	Hi 80	Lo 95	Hi 95
31	209.4668	166.6035	252.3301	143.9130	275.0205
32	210.7177	167.8544	253.5811	145.1640	276.2715
33	211.9687	169.1054	254.8320	146.4149	277.5225
34	213.2197	170.3564	256.0830	147.6659	278.7735

The below table, will represent the Hardcover forecast for the next four days.

	Point Forecast	Lo 80	Hi 80	Lo 95	Hi 95
31	250.1739	212.7390	287.6087	192.9222	307.4256
32	253.4765	216.0416	290.9113	196.2248	310.7282
33	256.7791	219.3442	294.2140	199.5274	314.0308
34	260.0817	222.6468	297.5166	202.8300	317.3334

Let’s have a visualization of the forecasts.

b)

Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.

First, let’s compare the respective RMSEs. From above, the previous Simple Exponential Smoothing model, we obtained the following RMSE values.

	Paperback	Hardcover
SES RMSE	33.63769	31.93101

Holt’s values returned from the Paperback model.

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	-3.717178	31.13692	26.18083	-5.508526	15.58354	0.6602122	-0.1750792

Holt’s values returned from the Hardcover model.

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	-0.1357882	27.19358	23.15557	-2.114792	12.1626	0.6908555	-0.0324519

By comparing, the Simple Exponential Smoothing and Holt’s RMSE, we can notice a difference in between them; that is, Holt’s model shows a lower RMSE value compared to the Simple Exponential Smoothing model.

c)

Compare the forecasts for the two series using both methods. Which do you think is best?

First, let’s perform a subtraction of the values in the following fashion:

[Forecast Simple Exponential Smoothing] - [Forecast Holt’s method]

Let’s check the Paperback first.

	Point.Forecast	Lo.80	Hi.80	Lo.95	Hi.95
31	-2.357101	-4.115252	-0.5989487	-5.045962	0.3317607
32	-3.608075	-5.995520	-1.2206295	-7.259358	0.0432077
33	-4.859049	-7.867155	-1.8509436	-9.459550	-0.2585487
34	-6.110024	-9.730502	-2.4895458	-11.647067	-0.5729806

Let’s check the Hardcover now.

	Point.Forecast	Lo.80	Hi.80	Lo.95	Hi.95
31	-10.61378	-15.53641	-5.691146	-18.14230	-3.085260
32	-13.91638	-21.06284	-6.769925	-24.84594	-2.986819
33	-17.21898	-26.48348	-7.954483	-31.38781	-3.050154
34	-20.52158	-31.81214	-9.231025	-37.78900	-3.254166

If we compare the two forecasts, by comparing the RMSE values, definitely, the Holt’s linear method seems to predict the behavior of the trends, in a better fashion compared to the Simple Exponential Smoothing method.

Also, by looking at the above differences, something interesting to note is that the values forecast by the Holt’s method are bigger and mostly wider intervals compared to the Simple Exponential Smoothing method; thus, definitely helps to capture a better confidence interval in the forecast.

d)

Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.

First, let’s estimate the standard deviation from the Paperback residuals with some visualizations.

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt's method
## Q* = 15.081, df = 3, p-value = 0.001749
## 
## Model df: 4.   Total lags used: 7

From the above, the standard deviation of the residuals is: $s$ = 31.44.

Now, let’s estimate the standard deviation from the Hardcover residuals with some visualizations.

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt's method
## Q* = 9.416, df = 3, p-value = 0.02424
## 
## Model df: 4.   Total lags used: 7

From the above, the standard deviation of the residuals is: $s$ = 27.66.

Let’s take a look at the following Paperback table:

	Point.Forecast	Lo.95	Hi.95
31 Holt’s Paperback Predicted	209.4668	143.912986	275.020545
31 Holt’s Paperback Manual	209.4668	147.839010	271.094521
31 Holt’s Paperback Difference	0.0000	-3.926025	3.926025

Let’s take a look at the following Hardcover table:

	Point.Forecast	Lo.95	Hi.95
31 Holt’s Hardcover Predicted	250.1739	192.922171	307.425574
31 Holt’s Hardcover Manual	250.1739	195.963968	304.383776
31 Holt’s Hardcover Difference	0.0000	-3.041797	3.041797

Previously, we calculated the values for SES; for simplicity, let’s recap the tables:

Here is the SES Paperback.

	Point.Forecast	Lo.95	Hi.95
31 SES’ Paperback Predicted	207.1097	138.867024	275.352306
31 SES’ Paperback Manual	207.1097	141.596372	272.622958
31 SES’ Paperback Difference	0.0000	-2.729348	2.729348

And here is the SES’ Hardcover table.

	Point.Forecast	Lo.95	Hi.95
31 SES’ Paperback Predicted	239.5601	174.779871	304.340313
31 SES’ Paperback Manual	239.5601	178.584829	300.535355
31 SES’ Paperback Difference	0.0000	-3.804958	3.804958

7.7

For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900–1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.

[Hint: use h=100 when calling holt() so you can clearly see the differences between the various options when plotting the forecasts.]

First, let’s have a view of the data:

Now, let’s have a visualization of it.

Under normal circumstances, I would use an h = 15; that is, to forecast 15 years from the data, since it has a good amount of years recorded; but based on the recommendation, let’s use h = 100 instead.

In order to compare, I have calculated a possible Box Cox lambda value of 0.3956183, since this value is near a cubic root, I will pick the cubic root as my eggs transformation.

eggs.BC <- eggs^(1/3)

Now, let’s compare the forecasting performance of the exponential smoothing methods.

fc1 <- holt(eggs, h=100)
fc2 <- holt(eggs, damped=TRUE, h=100)
fc3 <- holt(eggs.BC, h=100)
fc4 <- holt(eggs.BC, damped=TRUE, h=100)

Regular predictions.

Transformed predictions.

Something interesting about this behavior is that the Holt’s forecast method is predicting a Price of dozen eggs to be below $0; while the Holt’s damped method tend to keep it flat. Same analysis can be derived from the transformation as well.

Now, let’s check some results from these time series cross-validation to compare the one-step forecast accuracy of the four methods.

	Holt	HoltDamped	BCHolt	BCHoltDamped
Compare MSE	821.37428	847.71684	0.0709629	0.0720348
Compare MAE	20.95148	21.67774	0.2019644	0.2037994

Something interesting is that from the above table, there’s a suggestion that the Holt’s method seems to perform “better” that the damped method; the problem with that is the the Holt’s method is predicting prices under $0.

Which model gives the best RMSE?

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Holt’s	0.0449909	26.5821882	19.1849063	-1.1422014	9.653791	0.9463626	0.0134820
Holt’s Damped	-2.8914955	26.5401853	19.2794985	-2.9076331	10.018944	0.9510287	-0.0031954
Holt’s Cubic Root	-0.0000765	0.2474836	0.1837553	-0.1700493	3.209218	0.9386606	0.0202290
Holt’s Cubic Root Damped	-0.0158566	0.2491243	0.1875869	-0.4664749	3.290190	0.9582332	0.0115848

In it very interesting to note that the model returning the best RMSE with the non-transformed data is the Holt’s Damped method; but the interesting part is that the Cubic Root Transformation returned a Holt’s method as the one with the best RMSE. This is very interesting to me since I was expecting about the same kind of behavior.

7.8

Recall your retail time series data (from Exercise 3 in Section 2.10).

a)

Why is multiplicative seasonality necessary for this series?

Let’s recall that series:

read_excel(retail.xlsx)

My previously selected time series was A3349627V; and it represents the Turnover in New South Wales about Liquor retailing.

`seas(myts, x11="")`

`autoplot(myts)`

Now, from all the above visualization, we can conclude as follows:

Preamble: The additive method is preferred when the seasonal variations are roughly constant through the series. The multiplicative method is preferred when the seasonal variations are changing proportional to the level of the series. With the multiplicative method, the seasonal component is expressed in relative terms (percentages), and the series is seasonally adjusted by dividing through by the seasonal component.

Now, if we take a lot of detail attention to the above graphs, we can identify that the above description only is satisfied for the multiplicative method, that is since the seasonal variation is not considered roughly constant through the series.

b)

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

c)

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
HW multiplicative forecasts	0.3076124	5.916806	4.201477	0.0724387	3.779909	0.4485563	-0.0312262
HW multiplicative forecasts Damped	0.3511596	5.628031	4.018041	0.1776603	3.685373	0.4289724	-0.0551768

From the above table, seems that the HW multiplicative forecasts Damped has better results.

e)

Check that the residuals from the best method look like white noise.

For white noise series, we expect each auto correlation to be close to zero. Of course, they will not be exactly equal to zero as there is some random variation. For a white noise series, we expect 95% of the spikes in the ACF to lie within $\pm \frac{2}{\sqrt{T}}$ where $T$ is the length of the time series.

Let’s calculate this number:

From the calculations; it is noted that about 2.6 % of values lie outside the bound of $\pm$ 0.1. Hence, even though they look like white noise is not considered white noise based on the definition of white noise.

e)

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 8 in Section 3.7?

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
HW multiplicative forecasts	0.3301342	5.275059	3.825787	0.3442519	3.900028	0.4367368	0.0284223
HW multiplicative forecasts Damped	0.2243612	5.362182	3.837296	0.0562903	3.867764	0.4380506	0.0075028

Now, lets find the values from the seasonal naïve approach from Exercise 8 in Section 3.7.

Here are the results for the RMSE accuracy results:

HW multiplicative forecasts

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1	Theil.s.U
Training set	0.3301342	5.275059	3.825787	0.3442519	3.900028	0.4367368	0.0284223	NA
Test set	9.1057020	13.606928	10.885870	3.3900549	4.044825	1.2426880	0.5280136	0.3129365

HW multiplicative forecasts Damped

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1	Theil.s.U
Training set	0.2243612	5.362182	3.837296	0.0562903	3.867764	0.4380506	0.0075028	NA
Test set	21.9487125	25.434378	22.070974	8.1823930	8.243831	2.5195354	0.6205859	0.5795209

Seasonal naïve approach

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1	Theil.s.U
Training set	0.2243612	5.362182	3.837296	0.0562903	3.867764	0.4380506	0.0075028	NA
Test set	21.9487125	25.434378	22.070974	8.1823930	8.243831	2.5195354	0.6205859	0.5795209

From the above tables, we can conclude that in effect, we can beat the Seasonal naïve approach with the HW multiplicative forecasts approach.

7.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

In order to compare, I have calculated a possible Box Cox lambda value of -0.1452085, since this value is near 0, I will pick the log as my time series transformation.

myts.BC <- log(myts)
myts.train.BC <- log(myts.train)

Let’s visualize the components for the stl decomposition.

Let’s take a look at the summary results given by ets.

## ETS(M,A,M) 
## 
## Call:
##  ets(y = myts.train.BC) 
## 
##   Smoothing parameters:
##     alpha = 0.4425 
##     beta  = 1e-04 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 3.7553 
##     b = 0.0047 
##     s = 0.9971 0.9778 1.0022 1.0881 1.0152 1.0042
##            0.9891 0.9857 0.9805 0.9787 0.9891 0.9925
## 
##   sigma:  0.0112
## 
##       AIC      AICc       BIC 
## -32.28453 -30.34782  32.50487 
## 
## Training set error measures:
##                        ME       RMSE        MAE          MPE      MAPE
## Training set 0.0003987882 0.04879372 0.03639953 -0.001680687 0.8083399
##                   MASE       ACF1
## Training set 0.4380407 0.04535694

Now, let’s visualize the components.

From the above graphs, I don’t identify too much of a difference on the Box Cox transformation in between STL and ETS forecasting methods. Now, comparing the graphs to the original data, it seems to follow the patterns in a really good way; it follows the season and the trend in a really good approximation.

References

Chi Yau. 2013. R Tutorial with Bayesian Statistics Using Openbugs. USA: R-Tutor.com. http://www.r-tutor.com.

Kuhn, M. & Johnson, K. 2018. Applied Predictive Modeling. USA: Pfizer Global R&D. http://appliedpredictivemodeling.com/.

R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Masters in Data Science

CUNY SPS