library(fpp2)
library(ggplot2)

Homework-5 Exponential Smoothing

Exercise 7.1

Consider the pigs series — the number of pigs slaughtered in Victoria each month.

Use the ses() function in R to find the optimal values of $\alpha$ and $l_0$, and generate forecasts for the next four months.

fc <- ses(pigs, h = 4)
fc$model

## Simple exponential smoothing 
## 
## Call:
##  ses(y = pigs, h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.2971 
## 
##   Initial states:
##     l = 77260.0561 
## 
##   sigma:  10308.58
## 
##      AIC     AICc      BIC 
## 4462.955 4463.086 4472.665

Compute a 95% prediction interval for the first forecast using $\hat{y} \pm 1.96s$ where $s$ is the standard deviation of the residuals. Compare your interval with the interval produced by R.
```
s <- sd(residuals(fc))
ci_95 <- c(Lower = fc$mean[1] - 1.96*s, Upper = fc$mean[1] + 1.96*s)
ci_95
```
```
##     Lower     Upper 
##  78679.97 118952.84
```
The intervals produced by R are wider:
```
ci_95_R <- c(fc$lower[1,2], fc$upper[1,2])
names(ci_95_R) <- c("Lower", "Upper")
ci_95_R
```
```
##     Lower     Upper 
##  78611.97 119020.84
```
The explanation given by Hyndman: “This is because R estimates the variance of the residuals differently, taking account of the degrees of freedom properly.”

Exercise 7.5

Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.

Plot the series and discuss the main features of the data.
```
autoplot(books)
```

Use the ses() function to forecast each series, and plot the forecasts.

ses.fc_pb <- ses(books[,'Paperback'], h = 4)
ses.fc_hc <- ses(books[, 'Hardcover'], h = 4)
autoplot(books) +
  autolayer(ses.fc_pb, series="Paperback", PI=FALSE) +
  autolayer(ses.fc_hc, series="Hardcover", PI=FALSE)

Compute the RMSE values for the training data in each case.

ses.rmse_pb <- sqrt(mean(residuals(ses.fc_pb)^2))
ses.rmse_hc <- sqrt(mean(residuals(ses.fc_hc)^2))

RMSE [Paperback] = 33.64
RMSE [Hardcover] = 31.93

accuracy(ses.fc_pb)

##                    ME     RMSE     MAE       MPE     MAPE      MASE
## Training set 7.175981 33.63769 27.8431 0.4736071 15.57784 0.7021303
##                    ACF1
## Training set -0.2117522

accuracy(ses.fc_hc)

##                    ME     RMSE      MAE      MPE     MAPE      MASE
## Training set 9.166735 31.93101 26.77319 2.636189 13.39487 0.7987887
##                    ACF1
## Training set -0.1417763

Exercise 7.6

We will continue with the daily sales of paperback and hardcover books in data set books.

Apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.

h.fc_pb <- holt(books[,'Paperback'], h = 4)
h.fc_hc <- holt(books[, 'Hardcover'], h = 4)
autoplot(books) +
  autolayer(h.fc_pb, series="Paperback", PI=FALSE) +
  autolayer(h.fc_hc, series="Hardcover", PI=FALSE)

Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.
```
accuracy(h.fc_pb)
```
```
##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -3.717178 31.13692 26.18083 -5.508526 15.58354 0.6602122
##                    ACF1
## Training set -0.1750792
```
```
accuracy(h.fc_hc)
```
```
##                      ME     RMSE      MAE       MPE    MAPE      MASE
## Training set -0.1357882 27.19358 23.15557 -2.114792 12.1626 0.6908555
##                     ACF1
## Training set -0.03245186
```
The RMSE measures of Holt’s method in both cases are lower to those of SES method. This is not surprising as the data clearly indicate a positive trend and therefore Holt’s method is expected to be more accurate given that it takes into account the trend component.

Compare the forecasts for the two series using both methods. Which do you think is best?

The forcasts [shown below] by the Holt’s method are higher and therefore best, given that the data clearly has a positive linear trend.

fc <- data.frame(Paperback=c(ses.fc_pb$mean[1], h.fc_pb$mean[1]),
                 Hardcover=c(ses.fc_hc$mean[1], h.fc_hc$mean[1]),
                 row.names = c("SES", "HOLT"))
fc

##      Paperback Hardcover
## SES   207.1097  239.5601
## HOLT  209.4668  250.1739

Calculate a $95\%$ prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.

The below table compares the prediction interval using RMSE, with those intervals produced by R using ses and holt.

s <- accuracy(h.fc_pb)[1,"RMSE"]
r.ci_95_pb <- round(c(Lower = h.fc_pb$mean[1] - 1.96*s, Upper = h.fc_pb$mean[1] + 1.96*s), 2)

s <- accuracy(h.fc_hc)[1,"RMSE"]
r.ci_95_hc <- round(c(Lower = h.fc_hc$mean[1] - 1.96*s, Upper = h.fc_hc$mean[1] + 1.96*s), 2)

s.ci_95_pb <- round(c(ses.fc_pb$lower[1,2], ses.fc_pb$upper[1,2]), 2)
names(s.ci_95_pb) <- c("Lower", "Upper")

s.ci_95_hc <- round(c(ses.fc_hc$lower[1,2], ses.fc_hc$upper[1,2]), 2)
names(s.ci_95_hc) <- c("Lower", "Upper")

h.ci_95_pb <- round(c(h.fc_pb$lower[1,2], h.fc_pb$upper[1,2]), 2)
names(h.ci_95_pb) <- c("Lower", "Upper")

h.ci_95_hc <- round(c(h.fc_hc$lower[1,2], h.fc_hc$upper[1,2]), 2)
names(h.ci_95_hc) <- c("Lower", "Upper")

df.ci_95_R <- data.frame(Paperback=c(paste("(",r.ci_95_pb["Lower"],",",r.ci_95_pb["Upper"],")"),
                                     paste("(",s.ci_95_pb["Lower"],",",s.ci_95_pb["Upper"],")"),
                                     paste("(",h.ci_95_pb["Lower"],",",h.ci_95_pb["Upper"],")")),
                         Hardcover=c(paste("(",r.ci_95_hc["Lower"],",",r.ci_95_hc["Upper"],")"),
                                     paste("(",s.ci_95_hc["Lower"],",",s.ci_95_hc["Upper"],")"),
                                     paste("(",h.ci_95_hc["Lower"],",",h.ci_95_hc["Upper"],")")),
                         row.names = c("RMSE", "SES", "HOLT"))
df.ci_95_R

##                Paperback           Hardcover
## RMSE  ( 148.44 , 270.5 ) ( 196.87 , 303.47 )
## SES  ( 138.87 , 275.35 ) ( 174.78 , 304.34 )
## HOLT ( 143.91 , 275.02 ) ( 192.92 , 307.43 )

Exercise 7.7

For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900–1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.

[Hint: use h=100 when calling holt() so you can clearly see the differences between the various options when plotting the forecasts.]

Which model gives the best RMSE?

fc.1 <- holt(eggs, h = 100)
autoplot(eggs) +
  autolayer(fc.1, series="Holt", PI=TRUE)

fc.1$model

## Holt's method 
## 
## Call:
##  holt(y = eggs, h = 100) 
## 
##   Smoothing parameters:
##     alpha = 0.8124 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 314.7232 
##     b = -2.7222 
## 
##   sigma:  27.1665
## 
##      AIC     AICc      BIC 
## 1053.755 1054.437 1066.472

accuracy(fc.1)

##                      ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.04499087 26.58219 19.18491 -1.142201 9.653791 0.9463626
##                    ACF1
## Training set 0.01348202

fc.2 <- holt(eggs, h = 100, damped = TRUE)
autoplot(eggs) +
  autolayer(fc.2, series="Damped", PI=TRUE)

fc.2$model

## Damped Holt's method 
## 
## Call:
##  holt(y = eggs, h = 100, damped = TRUE) 
## 
##   Smoothing parameters:
##     alpha = 0.8462 
##     beta  = 0.004 
##     phi   = 0.8 
## 
##   Initial states:
##     l = 276.9842 
##     b = 4.9966 
## 
##   sigma:  27.2755
## 
##      AIC     AICc      BIC 
## 1055.458 1056.423 1070.718

accuracy(fc.2)

##                     ME     RMSE     MAE       MPE     MAPE      MASE
## Training set -2.891496 26.54019 19.2795 -2.907633 10.01894 0.9510287
##                      ACF1
## Training set -0.003195358

lambda <- round(BoxCox.lambda(eggs), 2)
fc.3 <- holt(eggs, h = 100, lambda = "auto")
autoplot(eggs) +
  autolayer(fc.3, series="BoxCox", PI=TRUE)

fc.3$model

## Holt's method 
## 
## Call:
##  holt(y = eggs, h = 100, lambda = "auto") 
## 
##   Box-Cox transformation: lambda= 0.3956 
## 
##   Smoothing parameters:
##     alpha = 0.809 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 21.0322 
##     b = -0.1144 
## 
##   sigma:  1.0549
## 
##      AIC     AICc      BIC 
## 443.0310 443.7128 455.7475

accuracy(fc.3)

##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.7736844 26.39376 18.96387 -1.072416 9.620095 0.9354593
##                    ACF1
## Training set 0.03887152

fc.4 <- holt(eggs, h = 100, damped = TRUE, lambda = "auto")
autoplot(eggs) +
  autolayer(fc.4, series="BoxCox + Damped", PI=TRUE)

fc.4$model

## Damped Holt's method 
## 
## Call:
##  holt(y = eggs, h = 100, damped = TRUE, lambda = "auto") 
## 
##   Box-Cox transformation: lambda= 0.3956 
## 
##   Smoothing parameters:
##     alpha = 0.8356 
##     beta  = 1e-04 
##     phi   = 0.98 
## 
##   Initial states:
##     l = 21.6922 
##     b = -0.1429 
## 
##   sigma:  1.068
## 
##      AIC     AICc      BIC 
## 446.2962 447.2617 461.5560

accuracy(fc.4)

##                      ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -0.8200445 26.53321 19.45654 -2.019718 9.976131 0.9597618
##                     ACF1
## Training set 0.005852382

It appears that among the above various models, then one with only Box-Cox tranformation (and $\lambda \approx0.4$) has the best (lowest) RMSE.

Exercise 7.8

Recall your retail time series data (from Exercise 3 in Section 2.10).

Why is multiplicative seasonality necessary for this series?
```
retaildata <- readxl::read_excel("retail.xlsx", skip = 1)
myts <- ts(retaildata[,"A3349398A"], frequency=12, start=c(1982,4))
autoplot(myts)    
```
When the seasonal variations are changing proportionally to the level of the series (as evident in the graph), the use of the multiplicative seasonality component in modelling is a preferred choice and a more effective choice.

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

hw.fc.1 <- hw(myts, seasonal = "multiplicative")
autoplot(hw.fc.1)

hw.fc.2 <- hw(myts, seasonal = "multiplicative", damped = TRUE)
autoplot(hw.fc.2)

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

accuracy(hw.fc.1)

##                    ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 1.496135 29.43051 22.25676 0.1603693 1.799731 0.2926361
##                    ACF1
## Training set -0.0300701

accuracy(hw.fc.2)

##                    ME     RMSE      MAE       MPE     MAPE     MASE
## Training set 4.244765 29.63087 22.26018 0.2959731 1.785469 0.292681
##                    ACF1
## Training set -0.2184672

Based on the [slightly] lower RMSE value, the non-damped method would be preferred. This is not surprising, given that the trend is near linear and we would expect it to continue this way.

Check that the residuals from the best method look like white noise.
```
checkresiduals(hw.fc.1)
```
```
## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' multiplicative method
## Q* = 244.86, df = 8, p-value < 2.2e-16
## 
## Model df: 16.   Total lags used: 24
```
Even though the histogram looks to be nearly normal, there are some significant auto-correlation [ACF] levels, which would indicate that the residuals are not entirely appearing as white noise.

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 8 in Section 3.7?

myts.train <- window(myts, end = c(2010, 12))
myts.test <- window(myts, start=2011)

hw.fc.train <- hw(myts.train, seasonal = "multiplicative")
hw.fc.a <- accuracy(hw.fc.train, x = myts)
hw.fc.a

##                      ME     RMSE      MAE        MPE     MAPE      MASE
## Training set   2.594132 29.87520 22.43516  0.2383033 1.864671 0.2985975
## Test set     -29.588194 49.09806 38.39804 -1.1062616 1.458304 0.5110531
##                     ACF1 Theil's U
## Training set -0.03687893        NA
## Test set      0.09931874  0.267754

sn.fc.train <- snaive(myts.train)
sn.fc.a <- accuracy(sn.fc.train, myts.test)
sn.fc.a

##                     ME      RMSE       MAE      MPE     MAPE     MASE
## Training set  73.94114  88.31208  75.13514 6.068915 6.134838 1.000000
## Test set     115.00000 127.92727 115.00000 4.459712 4.459712 1.530576
##                   ACF1 Theil's U
## Training set 0.6312891        NA
## Test set     0.2653013 0.7267171

The RMSE value of 49.1 from the Holt-Winters’ method is a lot better than 127.93 from the Seasonal Naïve approach.

Exercise 7.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

stlf.fc <- stlf(myts.train, lambda = "auto")
stlf.fc.a <- accuracy(stlf.fc, myts.test)
stlf.fc.a

##                       ME      RMSE       MAE       MPE     MAPE      MASE
## Training set   -1.077737  26.34835  19.95571 -0.020850 1.675349 0.2655976
## Test set     -101.367987 117.88372 101.84801 -3.846709 3.865079 1.3555310
##                     ACF1 Theil's U
## Training set -0.04737883        NA
## Test set      0.52778217  0.656884

The stlf function will use the ETS approach applied to the seasonally adjusted series if its method parameter is not specified. Given that the lambda parameter is set to “auto”, the (optimal) Box-Cox tranformation will be performed.

The RMSE value of 49.1 from the Holt-Winters’ method is still a lot better than 117.88 from this STL approach.

DATA-624 HW-5

Simon U.

03/08/2020