Exponential Smoothing
Libraries
Exercise 7.1
1.Consider the pigs series — the number of pigs slaughtered in Victoria each month.
a.Use the ses() function in R to find the optimal values of \(\alpha\) and \(l_{0}\), and generate forecasts for the next four months.
A cursory glance at pigs dataset.
## Jan Feb Mar Apr May Jun
## 1980 76378 71947 33873 96428 105084 95741
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 33873 79080 91662 90640 101493 120184
Now, I’ll apply function to ses(), with forecasting periods h = 4. Then I’ll display the model, with function summary(), and pick the optimal values of \(\alpha\) and \(l_{0}\).
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = pigs, h = 4)
##
## Smoothing parameters:
## alpha = 0.2971
##
## Initial states:
## l = 77260.0561
##
## sigma: 10308.58
##
## AIC AICc BIC
## 4462.955 4463.086 4472.665
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 385.8721 10253.6 7961.383 -0.922652 9.274016 0.7966249 0.01282239
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Sep 1995 98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995 98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995 98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995 98816.41 83958.37 113674.4 76092.99 121539.8
So, the optimal values of \(\alpha\) = 0.2971 and \(l_{0}\) = 77260.0561.
b.Compute a 95% prediction interval for the first forecast using \(\hat{y}\) ± 1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.
s <- sd(pigs_ses$residuals) # Standard Deviation, using residuals.
m <- pigs_ses$mean[1] # Storing the mean, using pigs_ses$mean[1]
#
lb <- round(m - 1.96 * s, 2)
ub <- round(m + 1.96 * s, 2)Computing the bound of the intervals, below.
## [1] "Lower bound CI = 78679.97"
## [1] "Upper bound CI = 118952.84"
R provides functions to directly compute the intervals.
rlb <- round(ses(pigs, h = 4, level = 95)$lower[1], 2)
rub <- round(ses(pigs, h = 4, level = 95)$upper[1], 2)## [1] "R computed lower bound = 78611.97"
## [1] "R computed lower bound = 119020.84"
Now, I’ll inspect the differences are between manual and R computed upper and lower bounds.
## [1] "Manually computed difference = 40272.87"
## [1] "R computed difference = 40408.87"
So, I observe that the R computed interval is wider than the manually computed one.
Exercise 7.5
5.Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.
a.Plot the series and discuss the main features of the data.
A cursory glance at books dataset.
## Time Series:
## Start = 1
## End = 6
## Frequency = 1
## Paperback Hardcover
## 1 199 139
## 2 172 128
## 3 111 172
## 4 209 139
## 5 161 191
## 6 119 168
## Paperback Hardcover
## Min. :111.0 Min. :128.0
## 1st Qu.:167.2 1st Qu.:170.5
## Median :189.0 Median :200.5
## Mean :186.4 Mean :198.8
## 3rd Qu.:207.2 3rd Qu.:222.0
## Max. :247.0 Max. :283.0
The plot of daily book sales.
Observations:
- Both paperback and hardcover are upward trending.
- Seasonality is not visible in the 30 days of data.
- In the first 10 days, the paperback books sales are higher, but from 10th to the end of the month, the hardcover books overtakes paperbacks.
b.Use the ses() function to forecast each series, and plot the forecasts.
For a comparative view.
autoplot(books[, "Paperback"], series = "Paperback") + autolayer(ses(books[, "Paperback"], h = 4), series = "Paperback") +
autolayer(books[, "Hardcover"], series = "Hardcover") + autolayer(ses(books[, "Hardcover"], h = 4), series = "Hardcover", PI = FALSE) +
ggtitle("Comparative forecasts from Daily Sales (SES)") + xlab("Day") + ylab("Books Sold")One important observation at this point is, although daily sales of both paperback and hardcover books are trending upward, the forcast doesn’t catch the trend.
c.Compute the RMSE values for the training data in each case.
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 7.18 33.64 27.84 0.47 15.58 0.7 -0.21
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 9.17 31.93 26.77 2.64 13.39 0.8 -0.14
Exercise 7.6
Note: I have a paper and PDF copies of the book Forecasting Principles, but online eidtion differs in some places. One difference is in the wording of question 7.6. I followed online edition, which is often used during meetups.
6.We will continue with the daily sales of paperback and hardcover books in data set books.
a.Now apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.
# autoplot(holt(books[, "Paperback"], h = 4)) + ggtitle("Forecasts from Daily Sales of Paperback (HOLT)") + xlab("Day") + ylab("Books Sales")
autoplot(books[, "Paperback"], series = "Paperback") + autolayer(holt(books[, "Paperback"], h = 4), series = "Paperback") +
ggtitle("Forecasts from Daily Sales of Paperback (HOLT)") + xlab("Day") + ylab("Books Sales")#autoplot(holt(books[, "Hardcover"], h = 4)) + ggtitle("Forecasts from Daily Sales of Hardcover (HOLT)") + xlab("Day") + ylab("Books Sales")
autoplot(books[, "Hardcover"], series = "Hardcover") + autolayer(holt(books[, "Hardcover"], h = 4), series = "Hardcover") +
ggtitle("Forecasts from Daily Sales of Hardcover (HOLT)") + xlab("Day") + ylab("Books Sales")For a comparative view of Holt’s method of forecast.
autoplot(books[, "Paperback"], series = "Paperback") + autolayer(holt(books[, "Paperback"], h = 4), series = "Paperback") +
autolayer(books[, "Hardcover"], series = "Hardcover") + autolayer(holt(books[, "Hardcover"], h = 4), series = "Hardcover", PI = FALSE) +
ggtitle("Comparative forecasts from Daily Sales (SES)") + xlab("Day") + ylab("Books Sold")In Holt’s method, we observe that forecast caught the trend, slightly. In sales of paperbacks, the trend is not so perceptble, but in sales of hardcover the mild upward gradinent is perceptible.
b.Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -3.72 31.14 26.18 -5.51 15.58 0.66 -0.18
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.14 27.19 23.16 -2.11 12.16 0.69 -0.03
c.Compare the forecasts for the two series using both methods. Which do you think is best?
grid.arrange(autoplot(ses(books[, "Paperback"], h = 4)) + ggtitle("Forecasts Sales of Paperback (SES)") + xlab("Day") + ylab("Books Sales"), autoplot(books[, "Paperback"], series = "Paperback") + autolayer(holt(books[, "Paperback"], h = 4), series = "Paperback") +
ggtitle("Forecasts Sales of Paperback (HOLT)") + xlab("Day") + ylab("Books Sales"), ncol = 2)grid.arrange(autoplot(ses(books[, "Hardcover"], h = 4)) + ggtitle("Forecasts Sales of Hardcover (SES)") + xlab("Day") + ylab("Books Sales"), autoplot(books[, "Hardcover"], series = "Hardcover") + autolayer(holt(books[, "Hardcover"], h = 4), series = "Hardcover") +
ggtitle("Forecasts Sales of Hardcover (HOLT)") + xlab("Day") + ylab("Books Sales"), ncol = 2)In Holt’s method, we observe that forecast caught the trend, slightly. In sales of paperbacks, the trend is not so perceptble, but in sales of hardcover it’s perceptible.
So, I would think that Holt’s method is an improvement over SES method.
d.Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.
Prediction intervals for paperback
RMSE_indx = 2 # RME is in the second index position. SO, I am initializing it with 2. It'll be used in next line of code.
s <- round(accuracy(holt(books[, "Paperback"], h = 4)), 2)[RMSE_indx]
print(paste0('Prediction interval using RMSE: ', "(", holt(books[, "Paperback"], h = 4)$mean[1] - 1.96 * s, ", ", holt(books[, "Paperback"], h = 4)$mean[1] + 1.96 * s, ")"))## [1] "Prediction interval using RMSE: (148.432365521051, 270.501165521051)"
SES method.
## [1] "SES method of interval: (138.867024106993, 275.352305883445)"
Holt’s method.
## [1] "SES method of interval: (143.912985820256, 275.020545221845)"
Prediction intervals for hardcover
RMSE_indx = 2 # RME is in the second index position. SO, I am initializing it with 2. It'll be used in next line of code.
s <- round(accuracy(holt(books[, "Hardcover"], h = 4)), 2)[RMSE_indx]
print(paste0('Prediction interval using RMSE: ', "(", holt(books[, "Hardcover"], h = 4)$mean[1] - 1.96 * s, ", ", holt(books[, "Hardcover"], h = 4)$mean[1] + 1.96 * s, ")"))## [1] "Prediction interval using RMSE: (196.881472212356, 303.466272212356)"
SES method.
## [1] "SES method of interval: (174.779870851487, 304.340313152036)"
Holt’s method.
## [1] "SES method of interval: (192.922170771941, 307.42557365277)"
Exercise 7.7
7.For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900–1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.
[Hint: use h=100 when calling holt() so you can clearly see the differences between the various options when plotting the forecasts.]
Which model gives the best RMSE?
A cursory glance at eggs dataset.
## Time Series:
## Start = 1900
## End = 1905
## Frequency = 1
## [1] 276.79 315.42 314.87 321.25 314.54 317.92
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 62.27 148.87 209.15 206.15 276.71 358.78
In order to get a rough idea of the price of eggs from 1900-1993, first, let me look at the autpplot.
The price of eggs trended downwards in all those years.
Now, let me use Holt’s forecasting.
Observation: While Holt’s has forecast correcty caught the downward trend, it’s an overkill, because price drops to zero and then eventually becomes negative, which is absurd.
We learn from Forecasting Principles, “The forecasts generated by Holt’s linear method display a constant trend (increasing or decreasing) indefinitely into the future. Empirical evidence indicates that these methods tend to over-forecast, especially for longer forecast horizons”.
This might be an example of over-forecast.
In order to alleviate this problem, Gardner & McKenzie introduced a parameter that dampens the trend to a flat line. We’ll observe this below.
Clearly, the forecast flattened out and removed the absurdity of negative price.
In the context of various methods of Holt, one may suppose that Holt-Winter’s method is relevant. But it won’t be applicable to eggs dataset, because the data is not seasonal. So, I am not trying Holt-Winter’s method.
Now, I’ll try with Box-Cox transformation.
Observe that due to Box-Cox transformation, no part of the inflated blue region is below zero. Could be because of logarithm.
Now, in order to get a comparative view, I’ll have them in one graphe.
autoplot(eggs) +
autolayer(holt(eggs, h = 100), series = 'Default', PI = F) +
autolayer(holt(eggs, h = 100, damped = T), series = 'Damped', PI = F) +
autolayer(holt(eggs, h = 100, exponential = T), series = 'Exponential', PI = F) +
autolayer(holt(eggs, lambda = BoxCox.lambda(eggs), h = 100), series = 'Box-Cox Transformed', PI = F) +
autolayer(holt(eggs, h = 100, exponential = T, damped = T), series = 'Damped & Exponential', PI = F) +
autolayer(holt(eggs, h = 100, damped = T, lambda = BoxCox.lambda(eggs), biasadj = T), series = 'Damped & Box-Cox', PI = F) +
ggtitle("Price of eggs 1900-93") + xlab("Years") + ylab("Price of eggs")We observed that by Holt’s (method without option), the price forecast can become negative. Damped alleviates the problem by flattening the line. Box-Cox and Exponential seem to run asymptotically along the x-axis.
8.Recall your retail time series data (from Exercise 3 in Section 2.10).
a.Why is multiplicative seasonality necessary for this series?
My code from from Exercise 3 in Section 2.10 are in the following code-chunk.
retail_data <- read_excel("retail.xlsx", skip = 1)
myts <- ts(retail_data[, "A3349398A"], frequency = 12, start = c(1982, 4))
autoplot(myts) + ggtitle("Retail Sales")The plot clearly shows that the variation increases with time So, multiplicative seaconality is necessary.
b.Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.
Note: In the following, I used the code from Holt-Winter’s section in the textbook and tweaked it to dampen the curve.
retail_myts <- window(myts, start = 1985)
fit1 <- hw(retail_myts, seasonal = "additive", h = 200)
fit2 <- hw(retail_myts, seasonal = "multiplicative", damped = TRUE, h = 200) # Added damped = TRUE
autoplot(retail_myts) +
autolayer(fit1, series = "HW additive forecasts", PI = FALSE) +
autolayer(fit2, series = "HW multiplicative forecasts", PI = FALSE) + xlab("Year") + ylab("Visitor nights (millions)") +
ggtitle("Retail Sales") + guides(colour = guide_legend(title = "Forecast"))Initially, I tried with damped = TRUE, but the dampening was imperceptible. So, I threw in parameter h = 200, to make the dampening conspicuous.
c.Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?
## [1] "Undamped RMSE: 29.4305130687359"
## [1] "Damped RMSE: 29.6308699428669"
We observe that Holt-Winter’s undamped RMSE is slightly lower that of the damped, and therefore it’s a better fit.
d.Check that the residuals from the best method look like white noise.
Residulas from both undamped and damped Holt-Winter’s methods look like whie noise.
The function checkresiduals() gives a better picture. So, I tried with the undamped method below.
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' multiplicative method
## Q* = 244.86, df = 8, p-value < 2.2e-16
##
## Model df: 16. Total lags used: 24
Although the distribution is normal, the graph of residuals is like white noise.
e.Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 8 in Section 3.7?
Here’s my code from section 3, Exercise 8, all lumped in one code-chunk.
myts.train <- window(myts, end = c(2010, 12))
myts.test <- window(myts, start = 2011)
#
fc <- snaive(myts.train)
#
accuracy(fc, myts.test)## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 73.94114 88.31208 75.13514 6.068915 6.134838 1.000000 0.6312891
## Test set 115.00000 127.92727 115.00000 4.459712 4.459712 1.530576 0.2653013
## Theil's U
## Training set NA
## Test set 0.7267171
Now, I’ll try once with damped and once with undamped multiplicative method of Holt-Winter.
Damped.
## ME RMSE MAE MPE MAPE MASE
## Training set 3.777291 29.21058 21.82543 0.2857987 1.875953 0.2904823
## Test set 26.176874 26.17687 26.17687 1.0017555 1.001756 0.3483972
## ACF1
## Training set -0.05362944
## Test set NA
undamped.
## ME RMSE MAE MPE MAPE MASE
## Training set 2.594132 29.87520 22.43516 0.2383033 1.864671 0.2985975
## Test set 31.399564 31.39956 31.39956 1.2016212 1.201621 0.4179079
## ACF1
## Training set -0.03687893
## Test set NA
Both of my HW’s RMSE (damped and undamped) beat the the RMSE of naive approach, and in fact by a very wide margin.
9.For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?
My prepared dataset, which I used in Secion 3, Exercise 8 is myts.train. I just used the same in 7.8e. In the below code-chunk, I’ll first apply Box-Cox transformation and then do STL-decomposition. Although this can be achieved in 3 or more steps, I’ll do it in one step, for simplicity.
And here is the ETS transformation on seasonally adjusted data. From the wording of problem 7.9, I first thought, I would have to apply ETS transformation on stlf_boxcox_myts.train, but that didn’t work. So, I interpreted it to mean what I did below.
Now, in the following, I’ll compare the accuracies of both Box-Cox-STLF transformation and ETS transformation.
Box-Cox-STLF transformation first.
## ME RMSE MAE MPE MAPE MASE
## Training set -1.077737 26.34835 19.95571 -0.020850 1.675349 0.2655976
## Test set -101.367987 117.88372 101.84801 -3.846709 3.865079 1.3555310
## ACF1 Theil's U
## Training set -0.04737883 NA
## Test set 0.52778217 0.656884
ETS transformation.
## ME RMSE MAE MPE MAPE MASE
## Training set 2.007835 29.33086 21.00199 0.133939 1.731450 0.2797569
## Test set -38.373703 154.24915 113.08350 -1.798220 4.264931 1.5063282
## ACF1 Theil's U
## Training set -0.02682440 NA
## Test set 0.07396657 0.8637663
It’s clear that the RMSE of Box-Cox-STLF transformation outperforms ETS transformation.
But it’s not better than the best previous forecast on test set.
Marker: 624-05