Forecasting HW2 - Hyndman Ch 7 (Exp Smoothing)

Chapter 7 - Exponential Smoothing

Question 1

Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books (data set books).

library(fma)

## Warning: package 'fma' was built under R version 3.4.4

## Loading required package: forecast

## Warning: package 'forecast' was built under R version 3.4.3

bk<-(books)
str(bk)

##  Time-Series [1:30, 1:2] from 1 to 30: 199 172 111 209 161 119 195 195 131 183 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:2] "Paperback" "Hardcover"

Plot the series and discuss the main features of the data.

plot(bk)

# cor(bk[,1],bk[,2]) 
# 0.223

Both hardcover and paperback books have a positive trend and both exhibit high variations/ fluctuations in daily sales. This may be due to sales spiking at certain periods of the week, probably at weekends. Also, some spikes in Hardcover sales may mean trough for hardcover and vice versa.
Note: 30 days however is a very short period to look at though(other than weekday/weekend trends); in an attempt to identifying the trends or patterns and predicting future values.

Use simple exponential smoothing with the ses function (setting initial=“simple”) and explore different values of α for the paperback series. Record the within-sample SSE for the one-step forecasts. Plot SSE against α and find which value of α works best. What is the effect of α on the forecasts?

# Exponential smoothing for paperback series
fit.es <- ses(bk[,1], initial='simple', alpha=0.2, h=3)
sum((bk[,1] - fitted(fit.es)))

## [1] 51.94123

alpha <- c(0.2,0.3, 0.4,0.5, 0.6,0.7,0.75, 0.8, 0.9, 0.85,  0.81, 1)
sse.pb<-NULL
for (i in 1:length(alpha)) {
  sse.pb[i]=sum((bk[,1] - fitted(ses(bk[,1], initial='simple', alpha=alpha[i], h=1))))
}

#identify()
plot(alpha, sse.pb)

alpha[which.min(sse.pb)]

## [1] 0.75

We can thus deduce that alpha of around 0.75 gives the least sum of sq errors.

Now let ses select the optimal value of α. Use this value to generate forecasts for the next four days. Compare your results with b.

fit1 <- ses(bk[,1], initial='simple', h=4)
fit2 <- ses(bk[,1], initial='simple', alpha=0.75, h=4)
summary(fit1); summary(fit2)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = bk[, 1], h = 4, initial = "simple") 
## 
##   Smoothing parameters:
##     alpha = 0.2125 
## 
##   Initial states:
##     l = 199 
## 
##   sigma:  34.7918
## Error measures:
##                    ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 1.749509 34.79175 28.64424 -2.770157 16.56938 0.7223331
##                    ACF1
## Training set -0.1268119
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       210.1537 165.5663 254.7411 141.9631 278.3443
## 32       210.1537 164.5706 255.7368 140.4404 279.8671
## 33       210.1537 163.5962 256.7112 138.9501 281.3573
## 34       210.1537 162.6418 257.6657 137.4905 282.8170

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = bk[, 1], h = 4, initial = "simple", alpha = 0.75) 
## 
##   Smoothing parameters:
##     alpha = 0.75 
## 
##   Initial states:
##     l = 199 
## 
##   sigma:  41.245
## Error measures:
##                    ME     RMSE      MAE       MPE     MAPE    MASE
## Training set 1.555328 41.24503 34.68559 -2.781873 19.65236 0.87468
##                    ACF1
## Training set -0.4172491
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       233.9949 181.1372 286.8525 153.1561 314.8337
## 32       233.9949 167.9228 300.0669 132.9464 335.0433
## 33       233.9949 156.9423 311.0475 116.1531 351.8366
## 34       233.9949 147.3422 320.6475 101.4710 366.5187

par(mfrow=c(2,1))
plot(fit1, main="Automatic alpha")
plot(fit2, main="Custom alpha: 0.75")

The fit1 model predicts lower forecasts than the fit2 model. The prediction intervals are also smaller for the fit1 model that ses() chooses automatically.

Repeat but with initial=“optimal”. How much difference does an optimal initial level make?

fit3 <- ses(bk[,1], initial='optimal', h=4)
sum((bk[,1] - fitted(fit3)))

## [1] 215.2864

The model is more pessimistic than those above and has a higher SSE.

Repeat steps (b)–(d) with the hardcover series.

# Exponential smoothing for hardcover series
alpha <- c(0.1,0.2,0.3, 0.4,0.5, 0.6,0.7,0.75, 0.8, 0.9, 0.85, 0.81, 1)   # same as above
sse.hc<-NULL
for (i in 1:length(alpha)) {
  sse.hc[i]=sum((bk[,2] - fitted(ses(bk[,2], initial='simple', alpha=alpha[i], h=4))))
}

plot(alpha, sse.hc)

fit1 <- ses(bk[,2], initial='simple', h=4)
fit2 <- ses(bk[,2], initial='simple', alpha=0.75, h=4)

par(mfrow=c(2,1))
plot(fit1, main="Automatic alpha")
plot(fit2, main="Custom alpha: 0.75")

fit3 <- ses(bk[,2], initial='optimal', h=4)
sum((bk[,2] - fitted(fit3)))

## [1] 275.0075

As alpha increases the SSE declines with diminishing returns. Letting the SES function automatically choose alpha gives lower point forecasts than the previous custom alpha produced. Setting the intial paramater to ‘optimal’ gives very similar results to ‘simple’.

Question 2

Apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.

Compare the SSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. Discuss the merits of the two forecasting methods for these data sets.

holt.pb <- holt(bk[,1], h=4, initial = "simple")
holt.hc <- holt(bk[,2], h=4, initial = "simple")

sse.holt.pb=sum((bk[,1] - fitted(holt.pb)))
sse.holt.hc=sum((bk[,2] - fitted(holt.hc)))

sse.holt.pb ; sse.holt.hc

## [1] 233.0953

## [1] 215.798

SSE is higher for paperback using Holts linear trend method and lower with simple exponential smoothing technique.
SSE is higher for harcover books using simple exponential smoothing technique and lower with Holts linear trend method.

Exponential Smoothing
Merits

It is easy to learn and apply.
It requires storing very little data.
It gives more significance to recent observations.
It’s best suited for short-term forecasting as it assumes future patterns and trends will look like current patterns and trends.

Limitations

It cannot handle trends well.

Compare the forecasts for the two series using both methods. Which do you think is best?
Answer
SSE is higher for paperback using Holts linear trend method and lower with simple exponential smoothing technique.
SSE is higher for harcover books using simple exponential smoothing technique and lower with Holts linear trend method.

Calculate a 95% prediction interval for the first forecast for each series using both methods, assuming normal errors. Compare your forecasts with those produced by R.

summary(fit1) ; summary(fit3)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = bk[, 2], h = 4, initial = "simple") 
## 
##   Smoothing parameters:
##     alpha = 0.3473 
## 
##   Initial states:
##     l = 139 
## 
##   sigma:  32.0198
## Error measures:
##                   ME     RMSE      MAE      MPE     MAPE      MASE
## Training set 9.72952 32.01982 26.34467 3.104211 13.05063 0.7860035
##                    ACF1
## Training set -0.1629042
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       240.3808 199.3457 281.4158 177.6231 303.1385
## 32       240.3808 196.9410 283.8206 173.9453 306.8162
## 33       240.3808 194.6625 286.0990 170.4608 310.3008
## 34       240.3808 192.4924 288.2691 167.1418 313.6197

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = bk[, 2], h = 4, initial = "optimal") 
## 
##   Smoothing parameters:
##     alpha = 0.3283 
## 
##   Initial states:
##     l = 149.2836 
## 
##   sigma:  31.931
## 
##      AIC     AICc      BIC 
## 315.8506 316.7737 320.0542 
## 
## Error measures:
##                    ME     RMSE     MAE      MPE     MAPE      MASE
## Training set 9.166918 31.93101 26.7731 2.636328 13.39479 0.7987858
##                    ACF1
## Training set -0.1417817
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       239.5602 198.6390 280.4815 176.9766 302.1439
## 32       239.5602 196.4905 282.6299 173.6908 305.4297
## 33       239.5602 194.4443 284.6762 170.5613 308.5591
## 34       239.5602 192.4869 286.6336 167.5677 311.5527

summary(holt.pb); summary(holt.hc)

## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = bk[, 1], h = 4, initial = "simple") 
## 
##   Smoothing parameters:
##     alpha = 0.2984 
##     beta  = 0.4984 
## 
##   Initial states:
##     l = 199 
##     b = -27 
## 
##   sigma:  39.5463
## Error measures:
##                    ME     RMSE     MAE      MPE     MAPE      MASE
## Training set 7.769844 39.54634 33.5377 1.633306 18.19621 0.8457332
##                    ACF1
## Training set -0.1088681
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## 31       222.0201 171.3394 272.7007 144.51068 299.5295
## 32       229.6904 164.8872 294.4935 130.58245 328.7983
## 33       237.3606 145.1175 329.6038  96.28696 378.4343
## 34       245.0309 115.5211 374.5407  46.96280 443.0991

## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = bk[, 2], h = 4, initial = "simple") 
## 
##   Smoothing parameters:
##     alpha = 0.439 
##     beta  = 0.1574 
## 
##   Initial states:
##     l = 139 
##     b = -11 
## 
##   sigma:  35.0438
## Error measures:
##                    ME     RMSE      MAE      MPE     MAPE      MASE
## Training set 7.193267 35.04383 27.99174 2.423793 14.18241 0.8351445
##                     ACF1
## Training set -0.07743714
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       250.7889 205.8784 295.6993 182.1042 319.4735
## 32       254.7003 202.4087 306.9918 174.7273 334.6733
## 33       258.6117 196.3181 320.9052 163.3419 353.8815
## 34       262.5231 187.9903 337.0558 148.5350 376.5111

Question 3

For this exercise, use the quarterly UK passenger vehicle production data from 1977:1–2005:1 (data set ukcars).

Plot the data and describe the main features of the series.

#install.packages("expsmooth")
library(expsmooth)

## Warning: package 'expsmooth' was built under R version 3.4.4

cars <- ukcars
plot(ukcars)

There is a strong seasonal effect throughout the series. The trend was declining until the early 80s then it started increasing until 2000. Production recovered in a year or two.

Decompose the series using STL and obtain the seasonally adjusted data.

decomposed <- stl(cars, s.window="periodic", robust=TRUE)
seasonal <- decomposed$time.series[,1]

cars_sa <- cars - seasonal

Forecast the next two years of the series using an additive damped trend method applied to the seasonally adjusted data. Then reseasonalize the forecasts. Record the parameters of the method and report the RMSE of the one-step forecasts from your method.

holt1 <- holt(cars_sa, h=8, damped = TRUE)

lastyear <- rep(decomposed$time.series[110:113,"seasonal"],2)
reseasonalized_fc <- holt1$mean + lastyear

summary(holt1)

## 
## Forecast method: Damped Holt's method
## 
## Model Information:
## Damped Holt's method 
## 
## Call:
##  holt(y = cars_sa, h = 8, damped = TRUE) 
## 
##   Smoothing parameters:
##     alpha = 0.5666 
##     beta  = 3e-04 
##     phi   = 0.9117 
## 
##   Initial states:
##     l = 346.0865 
##     b = -9.7583 
## 
##   sigma:  25.2032
## 
##      AIC     AICc      BIC 
## 1275.490 1276.283 1291.854 
## 
## Error measures:
##                    ME     RMSE     MAE       MPE     MAPE      MASE
## Training set 2.518454 25.20318 20.5804 0.3038991 6.585405 0.6707052
##                   ACF1
## Training set 0.0353549
## 
## Forecasts:
##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005 Q2       407.4117 375.1125 439.7109 358.0144 456.8090
## 2005 Q3       407.4110 370.2834 444.5387 350.6292 464.1929
## 2005 Q4       407.4104 366.0103 448.8105 344.0944 470.7264
## 2006 Q1       407.4099 362.1359 452.6838 338.1693 476.6504
## 2006 Q2       407.4094 358.5654 456.2533 332.7090 482.1097
## 2006 Q3       407.4089 355.2367 459.5811 327.6184 487.1994
## 2006 Q4       407.4085 352.1062 462.7108 322.8309 491.9860
## 2007 Q1       407.4081 349.1421 465.6741 318.2979 496.5182

Forecast the next two years of the series using Holt’s linear method applied to the seasonally adjusted data. Then reseasonalize the forecasts. Record the parameters of the method and report the RMSE of of the one-step forecasts from your method.

holt2 <- holt(cars_sa, h=8)
reseasonalized_fc <- holt2$mean + lastyear
summary(holt2)

## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = cars_sa, h = 8) 
## 
##   Smoothing parameters:
##     alpha = 0.6012 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 343.3854 
##     b = 0.6617 
## 
##   sigma:  25.3907
## 
##      AIC     AICc      BIC 
## 1275.166 1275.726 1288.803 
## 
## Error measures:
##                      ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -0.1407116 25.39072 20.14514 -0.5931913 6.500319 0.6565204
##                    ACF1
## Training set 0.02953472
## 
## Forecasts:
##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005 Q2       408.3701 375.8306 440.9096 358.6052 458.1350
## 2005 Q3       409.0302 371.0610 446.9993 350.9614 467.0990
## 2005 Q4       409.6903 366.9747 452.4058 344.3625 475.0181
## 2006 Q1       410.3503 363.3641 457.3366 338.4911 482.2096
## 2006 Q2       411.0104 360.1093 461.9116 333.1638 488.8570
## 2006 Q3       411.6705 357.1336 466.2074 328.2635 495.0774
## 2006 Q4       412.3306 354.3845 470.2766 323.7098 500.9514
## 2007 Q1       412.9906 351.8241 474.1572 319.4445 506.5368

Now use ets() to choose a seasonal model for the data.

fit3 <- ets(cars_sa)
summary(fit3)

## ETS(A,N,N) 
## 
## Call:
##  ets(y = cars_sa) 
## 
##   Smoothing parameters:
##     alpha = 0.6115 
## 
##   Initial states:
##     l = 319.5835 
## 
##   sigma:  25.2942
## 
##      AIC     AICc      BIC 
## 1270.304 1270.525 1278.487 
## 
## Training set error measures:
##                    ME     RMSE      MAE        MPE     MAPE      MASE
## Training set 1.268464 25.29416 20.22177 -0.1453187 6.505687 0.6590176
##                   ACF1
## Training set 0.0263301

# ETS(A,N,N)
# RMSE: 25.29416

Compare the RMSE of the fitted model with the RMSE of the model you obtained using an STL decomposition with Holt’s method. Which gives the better in-sample fits?

Answer
All models have all most identical RMSE values.

Compare the forecasts from the two approaches? Which seems most reasonable?

predict(fit3)

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005 Q2       407.2388 374.8231 439.6546 357.6632 456.8145
## 2005 Q3       407.2388 369.2421 445.2355 349.1279 465.3498
## 2005 Q4       407.2388 364.3819 450.0958 341.6948 472.7829
## 2006 Q1       407.2388 360.0193 454.4584 335.0228 479.4549
## 2006 Q2       407.2388 356.0270 458.4507 328.9171 485.5606
## 2006 Q3       407.2388 352.3242 462.1535 323.2541 491.2235
## 2006 Q4       407.2388 348.8557 465.6219 317.9496 496.5281
## 2007 Q1       407.2388 345.5821 468.8956 312.9430 501.5347

plot(holt1); plot(holt2) ; plot(fit3)

Holt’s linear method appears to be the most reasonable as it shows some trend.

Question 4

For this exercise, use the monthly Australian short-term overseas visitors data, May 1985–April 2005. (Data set: visitors.)

Make a time plot of your data and describe the main features of the series.

plot(visitors)

There is a positive linear trend, seasonality throughout the series and increasing variance.

Forecast the next two years using Holt-Winters’ multiplicative method.

two_years <- hw(visitors, h=24, seasonal='multiplicative')

Why is multiplicative seasonality necessary here?

Answer
It is necessary because the plot shows that the variance increases through the series.

Experiment with making the trend exponential and/or damped.

two_years1 <- hw(visitors, h=24, seasonal='multiplicative', damped=TRUE)
two_years2 <- hw(visitors, h=24, seasonal='multiplicative', exponential=TRUE)

Compare the RMSE of the one-step forecasts from the various methods. Which do you prefer?

plot(two_years,ylab="Monthly Australian visitors", plot.conf=FALSE,
     fcol="white", xlab="Year")

## Warning in plot.window(xlim, ylim, log, ...): "plot.conf" is not a
## graphical parameter

## Warning in title(main = main, xlab = xlab, ylab = ylab, ...): "plot.conf"
## is not a graphical parameter

## Warning in axis(1, ...): "plot.conf" is not a graphical parameter

## Warning in axis(2, ...): "plot.conf" is not a graphical parameter

## Warning in box(...): "plot.conf" is not a graphical parameter

lines(fitted(two_years), col="red", lty=2)
lines(fitted(two_years1), col="green", lty=2)
lines(fitted(two_years2), col='blue', lty=2)
lines(two_years$mean, type="o", col="red")
lines(two_years1$mean, type="o", col="green")
lines(two_years2$mean, type="o", col="blue")
legend("topleft",lty=1, pch=1, col=1:3,
       c("data","H-W Multiplicative", 'Damped H-W Multiplicative',
         'H-W Exponential'))

Now fit each of the following models to the same data:

  i. a multiplicative Holt-Winters' method;  
  ii. an ETS model;   
  iii. an additive ETS model applied to a Box-Cox transformed series;   
  iv. a seasonal naive method applied to the Box-Cox transformed series;   
  v. an STL decomposition applied to the Box-Cox transformed data followed by an ETS model applied to the seasonally adjusted (transformed) data.

# a) a multiplicative Holt-Winters' method

fit1 <- holt(visitors, seasonal='multiplicative')

# b) an ETS model

fit2 <- ets(visitors)

# c) an additive ETS model applied to a Box-Cox transformed series

lambda <- BoxCox.lambda(visitors)
boxcox_visitors <- BoxCox(visitors, lambda)
fit3 <- ets(boxcox_visitors, model='AAZ')

# d) a seasonal naive method applied to the Box-Cox transformed series

fit4 <- snaive(boxcox_visitors)

# e)  an STL decomposition applied to the Box-Cox transformed data followed by an ETS model
#     applied to the seasonally adjusted (transformed) data. 

decomposed <- decompose(boxcox_visitors)
seasonal_visitors <- decomposed$seasonal

fit5 <- ets(seasonal_visitors)

All models have all most identical RMSE values.

For each model, look at the residual diagnostics and compare the forecasts for the next two years. Which do you prefer?

plot((fitted(fit1)), residuals(fit1))

# Unbiased and heteroskedastic.

plot((fitted(fit2)), residuals(fit2))

# Unbiased and homoskedastic.

plot((fitted(fit3)), residuals(fit3))

# Slightly biased and borderline heteroskedastic.

plot((fitted(fit4)), residuals(fit4))

# Biased and heteroskedastic.

plot((fitted(fit5)), residuals(fit5))

# Biased and heteroskedastic.

plot(forecast(fit1, h=10))

plot(forecast(fit2, h=10))

The second model (ETS) has the best looking residual plot and it’s forecasts appear plausible (as does fit 1 but fit 1 has heteroskedasticity present in the residuals.)