Data 624 Predictive Analytics HW5

Chapter 7: Exponential Smoothing

library(tidyr)
library(dplyr)
library(knitr)
library(utils)
library(ggplot2)
library(mlbench)
library(corrplot)
library(purrr)
library(mice)
library(reshape2)
library(forecast)
library(fpp2)
library(gridExtra)
library(openxlsx)

Exercise 7.1

Consider the \(pigs\) series – the number of pigs slaughtered in Victoria each month.

Use the ses() function in R to find the optimal values of \(\alpha\) and \(l_{0}\), and generate forecasts for the next four months.

tsdisplay(pigs)

# Build forecasting model using the ses() function from forecast package:
fc = ses(pigs, h=4)

# Compare time series and fitted model
autoplot(fc) +
  autolayer(fitted(fc), series="Fitted") +
  ylab("Count") + xlab("Year")

# Show model's alpha parameter computed by ses()
cat("alpha =", fc$model$par[1])

## alpha = 0.2971488

# Show model's l0 parameter computed by ses()
cat("l0 =", fc$model$par[2])

## l0 = 77260.06

# Predictions for the next 4 months (from h=4 above):
predict(fc)

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Sep 1995       98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995       98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995       98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995       98816.41 83958.37 113674.4 76092.99 121539.8

Compute a 95% prediction interval for the first forecast using \(\hat{y} \pm 1.96s\) where \(s\) is the standard deviation of the residuals. Compare your interval with the interval produced by R.

# First plot the residuals
par(mfrow=c(1,2))
plot(fc$residuals, main="Residuals", ylab="Residual")
hist(fc$residuals, main="Residuals Histogram", xlab="Residual")

From the above we can see that residuals are normally distributed with mean 0.

sdr = sd(fc$residuals)
cat("Standard deviation of Residuals =", sdr)

## Standard deviation of Residuals = 10273.69

# Prediction interval using ses():
ses.prediction.interval = list(fc$lower[1,2], fc$upper[1,2])
unlist(ses.prediction.interval)

##       95%       95% 
##  78611.97 119020.84

# Prediction interval using formula:
ll = fc$mean[1] - 1.96 * sdr
hh = fc$mean[1] + 1.96 * sdr
calc.prediction.interval = list(ll, hh)
unlist(calc.prediction.interval)

## [1]  78679.97 118952.84

Exercise 7.5

Data set \(books\) contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.

Plot the series and discuss the main features of the data.

par(mfrow=c(1,1))
autoplot(books) +
    xlab("Days") + 
    ylab("Number of books") +
    ggtitle("SES Forecasts for Paperback & Hardcover books")

The data set $books$ is a bivariate time series. There is a clear upward trend
discernible for both components of the bivariate series, Paperback and Hardcover.
There also appears to be seasonality in sales for both Paperback and Hardcover.

Use the ses() function to forecast each series, and plot the forecasts.

df = data.frame(books)
pts = ts(df$Paperback)
hts = ts(df$Hardcover)

# ses() forecast for Paperback books
fcp.ses = ses(pts, h=4)
p1 = autoplot(fcp.ses) +
    xlab("Time") + ylab("Paperback Books") + ggtitle("SES Forecast for Paperbacks")

# ses() forecast for Hardcover books
fch.ses = ses(hts, h=4)
p2 = autoplot(fch.ses) +
    xlab("Time") + ylab("Hardcover Books") + ggtitle("SES Forecast for Hardcovers")

grid.arrange(arrangeGrob(p1, p2, nrow=1, widths=c(1,1)))

Compute the RMSE values for the training data in each case.

(fcp.ses.acc = accuracy(fcp.ses))

##                    ME     RMSE     MAE       MPE     MAPE      MASE
## Training set 7.175981 33.63769 27.8431 0.4736071 15.57784 0.7021303
##                    ACF1
## Training set -0.2117522

(fch.ses.acc = accuracy(fch.ses))

##                    ME     RMSE      MAE      MPE     MAPE      MASE
## Training set 9.166735 31.93101 26.77319 2.636189 13.39487 0.7987887
##                    ACF1
## Training set -0.1417763

Exercise 7.6

Now apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.

fcp.holt = holt(pts, h=4)
(fcp.holt.acc = accuracy(fcp.holt))

##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -3.717178 31.13692 26.18083 -5.508526 15.58354 0.6602122
##                    ACF1
## Training set -0.1750792

predict(fcp.holt)

##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       209.4668 166.6035 252.3301 143.9130 275.0205
## 32       210.7177 167.8544 253.5811 145.1640 276.2715
## 33       211.9687 169.1054 254.8320 146.4149 277.5225
## 34       213.2197 170.3564 256.0830 147.6659 278.7735

fch.holt = holt(hts, h=4)
(fch.holt.acc = accuracy(fch.holt))

##                      ME     RMSE      MAE       MPE    MAPE      MASE
## Training set -0.1357882 27.19358 23.15557 -2.114792 12.1626 0.6908555
##                     ACF1
## Training set -0.03245186

predict(fch.holt)

##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       250.1739 212.7390 287.6087 192.9222 307.4256
## 32       253.4765 216.0416 290.9113 196.2248 310.7282
## 33       256.7791 219.3442 294.2140 199.5274 314.0308
## 34       260.0817 222.6468 297.5166 202.8300 317.3334

Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.

ses.rmse = list(fcp.ses.acc[2], fch.ses.acc[2])
holt.rmse = list(fcp.holt.acc[2], fch.holt.acc[2])
rmse = data.frame(Paperback=numeric(2), Hardcover=numeric(2))
rownames(rmse) = c("SES", "Holt")
rmse[,1] = unlist(ses.rmse)
rmse[,2] = unlist(holt.rmse)

# Compare RMSE for both series using both methods:
kable(rmse, caption = " RMSE for Books using SES and Holt:")

RMSE for Books using SES and Holt:
	Paperback	Hardcover
SES	33.63769	31.13692
Holt	31.93101	27.19358

The RMSE for both series using Holt's method are lower.

Compare the forecasts for the two series using both methods. Which do you think is best?

f1 = autoplot(fcp.ses) +
    ggtitle("SES Forecasts") +
    xlab("Days") +
    ylab("Books") +
    autolayer(fitted(fcp.ses), series="SES Paperback") +
    autolayer(fitted(fch.ses), series="SES Hardcover") + coord_fixed(ratio=1/4)


f2 = autoplot(fcp.holt) +
    ggtitle("Holt Forecasts") +
    xlab("Days") +
    ylab("Books") +
    autolayer(fitted(fcp.holt), series="Holt Paperback") +
    autolayer(fitted(fch.holt), series="Holt Hardcover") + coord_fixed(ratio=2/8)

grid.arrange(arrangeGrob(f1, f2, nrow=1, widths=c(1,1)))

The Holt method does provide a lower RMSE. For this data set, it seems to perform marginally
better than SES.

Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.

# prediction interval using SES:
ses.prediction.interval = list(fcp.ses$lower[1,2], fcp.ses$upper[1,2])
cat("Prediction for Paperback using SES:", unlist(ses.prediction.interval))

## Prediction for Paperback using SES: 138.867 275.3523

ses.prediction.interval = list(fch.ses$lower[1,2], fch.ses$upper[1,2])
cat("Prediction for Hardcover using SES:", unlist(ses.prediction.interval))

## Prediction for Hardcover using SES: 174.7799 304.3403

holt.prediction.interval = list(fcp.holt$lower[1,2], fcp.holt$upper[1,2])
cat("Prediction for Paperback using Holt:", unlist(holt.prediction.interval))

## Prediction for Paperback using Holt: 143.913 275.0205

holt.prediction.interval = list(fch.holt$lower[1,2], fch.holt$upper[1,2])
cat("Prediction for Hardcover using Holt:", unlist(holt.prediction.interval))

## Prediction for Hardcover using Holt: 192.9222 307.4256

Exercise 7.7

For this exercise use data set \(eggs\), the price of a dozen eggs in the United States from 1900-1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.

[Hint: use h=100 when calling holt() so you can clearly see the differences between the various options when plotting the forecasts.]

Which model gives the best RMSE?

eggs.holt.1 = holt(eggs, h=100, initial="optimal", damped=TRUE)
(m1 = eggs.holt.1$model)

## Damped Holt's method 
## 
## Call:
##  holt(y = eggs, h = 100, damped = TRUE, initial = "optimal") 
## 
##   Smoothing parameters:
##     alpha = 0.8462 
##     beta  = 0.004 
##     phi   = 0.8 
## 
##   Initial states:
##     l = 276.9842 
##     b = 4.9966 
## 
##   sigma:  27.2755
## 
##      AIC     AICc      BIC 
## 1055.458 1056.423 1070.718

(rmse1 = accuracy(eggs.holt.1)[2])

## [1] 26.54019

eggs.holt.2 = holt(eggs, h=100, initial="optimal", damped=TRUE, alpha=0.5, beta=0.001, phi=0.9)
(m2 = eggs.holt.2$model)

## Damped Holt's method 
## 
## Call:
##  holt(y = eggs, h = 100, damped = TRUE, initial = "optimal", alpha = 0.5,  
## 
##  Call:
##      beta = 0.001, phi = 0.9) 
## 
##   Smoothing parameters:
##     alpha = 0.5 
##     beta  = 0.001 
##     phi   = 0.9 
## 
##   Initial states:
##     l = 295.8355 
##     b = 0.2435 
## 
##   sigma:  28.7827
## 
##      AIC     AICc      BIC 
## 1059.569 1059.836 1067.199

(rmse2 = accuracy(eggs.holt.2)[2])

## [1] 28.0067

autoplot(eggs.holt.1) + xlab("Year") + ylab("Price per dozen") +
    autolayer(fitted(eggs.holt.1), series="Fitted Optimal Auto") +
    autolayer(fitted(eggs.holt.2), series="Fitted Optimal Manual")

lambda = BoxCox.lambda(eggs)
eggs.bc.1 = BoxCox(eggs, lambda = lambda)
autoplot(eggs.bc.1)

Exercise 7.8

Recall your retail time series data (from Exercise 3 in Section 2.10).

file2 = "retail.xlsx"
retaildata <- read.xlsx(file2, sheet=1, startRow=2)
myts <- ts(retaildata[,"A3349873A"], frequency=12, start=c(1982,4))
ggseasonplot(myts)

Why is multiplicative seasonality necessary for this series?

From the seasonality plot above we see that the amplitude of the seasonal variation is dependent on the level, i.e. the change gets “wider” over time. Hence the change in level is not constant, implying that an additive model would be insufficient.
Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

fit1 = hw(myts, seasonal="multiplicative", h=1)
fit2 = hw(myts, seasonal="multiplicative", damped=TRUE, h=1)

autoplot(myts) +
    autolayer(fit1, series="HW multiplicative forecasts", PI=FALSE) +
    autolayer(fit2, series="HW damped trend multiplicative forecasts", PI=FALSE) +
    xlab("Time") +
    ylab("Retail data") +
    guides(colour=guide_legend(title="Forecast"))

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

accuracy(fit1)

##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set 0.1170648 13.29378 8.991856 -0.1217735 3.918351 0.4748948
##                    ACF1
## Training set 0.08635577

accuracy(fit2)

##                    ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 1.414869 13.30494 9.042151 0.6105987 3.959617 0.4775511
##                    ACF1
## Training set 0.04077895

The RMSE of the two methods are almost identical.

Check that the residuals from the best method look like white noise.

par(mfrow=c(1,2))
plot(fit1$residuals)
plot(fit2$residuals)

The residuals appear to be randomly distributed with no trend.

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naive approach from Exercise 8 in Section 3.7?

par(mfrow=c(1,1))
myts.train <- ts(retaildata[,"A3349873A"], frequency=12, start=c(1982,4), end=c(2010,12))
myts.test <- ts(retaildata[,"A3349873A"], frequency=12, start=c(2011,1))
plot(myts.train)

fc.naive = snaive(myts.train)
accuracy(fc.naive, myts.test)

##                       ME      RMSE       MAE         MPE       MAPE
## Training set    7.772973  20.24576  15.95676    4.702754   8.109777
## Test set     -211.383333 215.25523 211.38333 -306.837431 306.837431
##                  MASE      ACF1 Theil's U
## Training set  1.00000 0.7385090        NA
## Test set     13.24726 0.2941485  16.08461

fit3 = hw(myts.train, seasonal="multiplicative", damped=TRUE)
accuracy(fit3, myts.test)

##                        ME       RMSE       MAE          MPE       MAPE
## Training set    0.4556121   8.681456   6.24903    0.2040939   3.151257
## Test set     -199.2093788 204.166446 199.20938 -289.4304780 289.430478
##                    MASE        ACF1 Theil's U
## Training set  0.3916228 -0.01331859        NA
## Test set     12.4843276  0.23199995   15.2358

As seen above, the RMSE of the Holt-Winter model is superior to that of the naive seasonal model.

Exercise 7.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

myts.stl.fit <- stlm(myts.train, lambda="auto")
accuracy(myts.stl.fit)

##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -0.646515 8.147517 5.855529 -0.3001347 2.885877 0.3669624
##                    ACF1
## Training set 0.02389584

# ETS model with multiplicative error, additive trend and multiplicative seasonality
# is chosen. Seasonality is multiplicative as seen above.
myts.ets.fit <- ets(myts.train, model = "MAM")
autoplot(myts.ets.fit)

accuracy(myts.ets.fit)

##                      ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -0.3305082 9.010034 6.344188 -0.120261 3.114127 0.3975863
##                    ACF1
## Training set 0.05513695

The accuracies in both the above models (STL and ETS) are better than the prior models.