Chapter 7 - Exponential smoothing

Consider the pigs series - the number of pigs slaughtered in Victoria each month.

Use the ses function in R to find the optimal values of alpha and l0, and generate forecasts for the next four months.
Compute a 95% prediction interval for the first forecast using ^y ± 1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

str(pigs)

##  Time-Series [1:188] from 1980 to 1996: 76378 71947 33873 96428 105084 ...

head(pigs)

##         Jan    Feb    Mar    Apr    May    Jun
## 1980  76378  71947  33873  96428 105084  95741

How SES model are fitted:

ses_pigs <- ses(pigs, h = 4)
ses_pigs$model

## Simple exponential smoothing 
## 
## Call:
##  ses(y = pigs, h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.2971 
## 
##   Initial states:
##     l = 77260.0561 
## 
##   sigma:  10308.58
## 
##      AIC     AICc      BIC 
## 4462.955 4463.086 4472.665

95% prediction interval for the first forecast:

(ses_pigs$upper[1, "95%"])

##      95% 
## 119020.8

(ses_pigs$lower[1, "95%"])

##      95% 
## 78611.97

calculate 95% prediction interval using formula:

s <- sd(ses_pigs$residuals)
(ses_pigs$mean[1] + 1.96*s)

## [1] 118952.8

(ses_pigs$mean[1] - 1.96*s)

## [1] 78679.97

autoplot(ses_pigs) +
  autolayer(ses_pigs$fitted)

Fitted values and forecasts are plotted
Even though small compared to the data scale, the results were a little different from the results of ses function

Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days' sales for paperback and hardcover books.

Plot the series and discuss the main features of the data.
Use the ses function to forecast each series, and plot the forecasts.
Compute the RMSE values for the training data in each case.

str(books)

##  Time-Series [1:30, 1:2] from 1 to 30: 199 172 111 209 161 119 195 195 131 183 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:2] "Paperback" "Hardcover"

head(books)

## Time Series:
## Start = 1 
## End = 6 
## Frequency = 1 
##   Paperback Hardcover
## 1       199       139
## 2       172       128
## 3       111       172
## 4       209       139
## 5       161       191
## 6       119       168

autoplot(books)

The sales of paperback and hardcover books generally increased as time went on with lots of fluctuations.
But the fluctuations doesn’t show particular frequency that they can be thought of as cycle.

ses_paperback <- ses(books[, "Paperback"], h = 4)
ses_hardcover <- ses(books[, "Hardcover"], h = 4)

autoplot(books[, "Paperback"], series = "Paperback") +
  autolayer(ses_paperback, series = "Paperback") +
  autolayer(books[, "Hardcover"], series = "Hardcover") +
  autolayer(ses_hardcover, series = "Hardcover", PI = FALSE) +
  ylab("Sales Amount") +
  ggtitle("Books Sales")

Can see the flat forecast by SES method.

sqrt(mean(ses_paperback$residuals^2))

## [1] 33.63769

sqrt(mean(ses_hardcover$residuals^2))

## [1] 31.93101

RMSE: hardcover_sales < paperback_sales
RMSE values for the training data show that the variance of the residuals of hardcover sales was smaller than the one of paperback sales.

Now apply Holt's linear method to the paperback and hardback series and compute four-day forecasts in each case.
Compare the RMSE measures of Holt's method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt's method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.
Compare the forecasts for the two series using both methods. Which do you think is best?
Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.

paperback <- holt(books[, "Paperback"], h = 4)
hardcover <- holt(books[, "Hardcover"], h = 4)

autoplot(books[, "Paperback"]) + autolayer(paperback)

autoplot(books[, "Hardcover"]) + autolayer(hardcover)

Can see the linear trend in the forecasts.

(s_paperback <- sqrt(mean(paperback$residuals^2)))

## [1] 31.13692

(s_hardcover <- sqrt(mean(hardcover$residuals^2)))

## [1] 27.19358

In both, RMSE values became lower when Holt’s method was used.
If there is linearly approximable trend in data, it would be better to use Holt’s linear method even if one more parameter is needed than SES.
Or else, it would be better to use SES method to make the model simpler.

I think that the forecasts of hardcover sales were better than the ones of paperback sales. Because RMSE value is lower for hardcover sales. And because the forecasts of paperback sales couldn’t reflect the pattern in the data using Holt’s method.

cat("95% PI of paperback sales calculated by holt function:\nupper: ",
paperback$upper[1, "95%"], "\nlower: ", paperback$lower[1, "95%"])

## 95% PI of paperback sales calculated by holt function:
## upper:  275.0205 
## lower:  143.913

cat("95% PI of paperback sales calculated by formula:\nmean+: ",
paperback$mean[1] + 1.96*s_paperback, "\nmean-: ", paperback$mean[1] - 1.96*s_paperback)

## 95% PI of paperback sales calculated by formula:
## mean+:  270.4951 
## mean-:  148.4384

cat("95% PI of hardcover sales calculated by holt function:\nupper: ",
hardcover$upper[1, "95%"], "\nlower: ", hardcover$lower[1, "95%"])

## 95% PI of hardcover sales calculated by holt function:
## upper:  307.4256 
## lower:  192.9222

cat("95% PI of hardcover sales calculated by formula:\nmean+: ",
hardcover$mean[1] + 1.96*s_hardcover, "\nmean-: ", hardcover$mean[1] - 1.96*s_hardcover)

## 95% PI of hardcover sales calculated by formula:
## mean+:  303.4733 
## mean-:  196.8745

In this case, the prediction interval for the first forecast for each series was almost same regardless of calculating method. It is different from the SES case, in which the PI was different when it was calculated by ses function and formula respectively.

For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900-1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.

[Hint: use h=100 when calling holt() so you can clearly see the differences between the various options when plotting the forecasts.]

Which model gives the best RMSE?

str(eggs)

##  Time-Series [1:94] from 1900 to 1993: 277 315 315 321 315 ...

head(eggs)

## Time Series:
## Start = 1900 
## End = 1905 
## Frequency = 1 
## [1] 276.79 315.42 314.87 321.25 314.54 317.92

autoplot(eggs)

Downward trend of the price of dozen eggs in US is noticed.
With damped = TRUE in holt function and Box-Cox options will yield best forecasts.
Expected to see the price of eggs to decrease more slowly as the price is going to near 0.
There’s a need to make the size of the seasonal variation smaller for bigger prices.

1) Just use holt function without using any options:

holt_eggs <- holt(eggs, h = 100)
autoplot(holt_eggs) + autolayer(holt_eggs$fitted)

Unrealistic because the predicted price is going to be below 0.

2) Use holt function with damped option:

holt_damped_eggs <- holt(eggs, damped = TRUE, h = 100)
autoplot(holt_damped_eggs) + autolayer(holt_damped_eggs$fitted)

The predicted price is now above 0, but point forecasts didn’t reflect the existing trend.

3) Use holt function with Box-Cox transformation:

holt_BoxCox_eggs <- holt(eggs, lambda = BoxCox.lambda(eggs), h = 100)
autoplot(holt_BoxCox_eggs) + autolayer(holt_BoxCox_eggs$fitted)

Now, the point forecasts didn’t go below 0 and reflected the existing trend.

4) Use holt function with Box-Cox transformation and damped option:

holt_BoxCox_damped_eggs <- holt(eggs, damped = TRUE, lambda = BoxCox.lambda(eggs), h = 100)
autoplot(holt_BoxCox_damped_eggs) + autolayer(holt_BoxCox_damped_eggs$fitted)

The point forecasts didn’t go below 0 and are still decreasing. But they didn’t reflect the existing trend well. Lower ends of prediction intervals were below 0.

Show RMSE values for each model:

cat("RMSE using holt function = ", sqrt(mean(holt_eggs$residuals^2)), "\nRMSE using holt function with damped option = ", sqrt(mean(holt_damped_eggs$residuals^2)), "\nRMSE using holt function with Box-Cox transformation = ", sqrt(mean(holt_BoxCox_eggs$residuals^2)), "\nRMSE using holt function with damped option and Box-Cox transformation = ", sqrt(mean(holt_BoxCox_damped_eggs$residuals^2)))

## RMSE using holt function =  26.58219 
## RMSE using holt function with damped option =  26.54019 
## RMSE using holt function with Box-Cox transformation =  1.032217 
## RMSE using holt function with damped option and Box-Cox transformation =  1.039187

BoxCox transformation captures trend and reflects it to the forecasts, therefore it improves accuracy of the model.
Holt’s method with damped option just prohibits the forecasts to be below 0, not much improving accuracy .
The best model was the Box-Cox transformation with Holt’s linear method. It gave plausible point forecasts and prediction intervals.
For 100 years’ prediction, Box-Cox transformation did enough damping effect.
With damping option together, the point forecast couldn’t follow the existing trend.

Recall your retail time series data (from Exercise 3 in Section 2.10).

Why is multiplicative seasonality necessary for this series?
Apply Holt-Winters' multiplicative method to the data. Experiment with making the trend damped.
Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?
Check that the residuals from the best method look like white noise.
Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naive approach from Exercise 7 in Section 3.7?

retail <- read_excel("/Users/hovig/Downloads/retail.xlsx", skip=1)
ts_retail <- ts(retail[, "A3349873A"], frequency = 12, start = c(1982, 4))

autoplot(ts_retail)

the data show that the seasonality indices increased when the retail sales increased. Multiplicative seasonality can reflect the situation in the model, while additive seasonality can’t.

ets_AAM_retail <- hw(ts_retail, seasonal = "multiplicative")
ets_AAdM_retail <- hw(ts_retail, seasonal = "multiplicative", damped = TRUE)

autoplot(ets_AAM_retail)

autoplot(ets_AAdM_retail)

The forecasts increased more slowly when damped option was used than it wasn’t used.

error_ets_AAM_retail <- tsCV(ts_retail, hw, h = 1, seasonal = "multiplicative")
error_ets_AAdM_retail <- tsCV(ts_retail, hw, h = 1, seasonal = "multiplicative", damped = TRUE)

sqrt(mean(error_ets_AAM_retail^2, na.rm = TRUE))

## [1] 14.72762

sqrt(mean(error_ets_AAdM_retail^2, na.rm = TRUE))

## [1] 14.94306

When the RMSE values were compared, they were almost same. Therefore I prefer damped model because it will prohibit the limitless increase of sales forecast.

checkresiduals(ets_AAdM_retail)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt-Winters' multiplicative method
## Q* = 42.932, df = 7, p-value = 3.437e-07
## 
## Model df: 17.   Total lags used: 24

Unfortunately, the residuals from the best method don’t look like white noise. Ljung-Box test result and ACF plot show that the residuals aren’t white noise.

ts_retail_train <- window(ts_retail, end = c(2010, 12))
ts_retail_test <- window(ts_retail, start = 2011)

Holt-Winters’ method with damped option:

retail_train_ets_AAdM <- hw(ts_retail_train, h = 36, seasonal = "multiplicative", damped = TRUE)

autoplot(retail_train_ets_AAdM)

accuracy(retail_train_ets_AAdM, ts_retail_test)

##                      ME       RMSE      MAE        MPE      MAPE      MASE
## Training set  0.4556121   8.681456  6.24903  0.2040939  3.151257 0.3916228
## Test set     94.7346169 111.911266 94.73462 24.2839784 24.283978 5.9369594
##                     ACF1 Theil's U
## Training set -0.01331859        NA
## Test set      0.60960299   1.90013

Holt-Winters’ method without damped option:

retail_train_ets_AAM <- hw(ts_retail_train, h = 36, seasonal = "multiplicative")

autoplot(retail_train_ets_AAM)

accuracy(retail_train_ets_AAM, ts_retail_test)

##                       ME      RMSE       MAE          MPE      MAPE
## Training set  0.03021223  9.107356  6.553533  0.001995484  3.293399
## Test set     78.34068365 94.806617 78.340684 19.945024968 19.945025
##                   MASE       ACF1 Theil's U
## Training set 0.4107058 0.02752875        NA
## Test set     4.9095618 0.52802701  1.613903

When I used Holt-Winters’ method with damped option, I couldn’t beat seasonal naive approach.
When I used Holt-Winters’ method without damped option, I could get better accuracy than when I used the option but it still couldn’t beat the seasonal naive approach.
In this case, damped Holt-Winters’ method was worse than Holt-Winters’ method because the actual sales amount in the forecast horizon was exponentially increasing, not damping.
I think that this case reflects the fact that the assumption behind the chosen forecast method should be right to forecast more accurately.

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

Forecast using stl model:

##                      ME      RMSE       MAE        MPE      MAPE      MASE
## Training set -0.6782982  8.583559  5.918078 -0.3254076  2.913104 0.3708823
## Test set     82.1015276 98.384220 82.101528 21.0189982 21.018998 5.1452516
##                    ACF1 Theil's U
## Training set 0.02704667        NA
## Test set     0.52161725  1.679783

I didn’t need to use seasadj function because forecasts of STL objects are applying non-seasonal forecasting method to the seasonally adjusted data automatically.

Forecasting without doing transformation:

##                      ME     RMSE       MAE       MPE      MAPE      MASE
## Training set -0.5020795 10.00826  6.851597 -0.391432  3.489759 0.4293853
## Test set     74.2529959 91.04491 74.252996 18.837766 18.837766 4.6533890
##                    ACF1 Theil's U
## Training set 0.09741266        NA
## Test set     0.48917501  1.549271

ETS forecasting after STL decomposition with Box-Cox transformation yielded better result than when ETS(A, Ad, M) was used. But the method was a little worse than ETS(A, A, M). It still couldn’t beat seasonal naive method.
Without doing transformation, when I got accuracy using test set I got better result. But I couldn’t expect it because when I also used transformation, the accuracy of training set was better. In fact, the actual values in forecast horizon increased exponentially.
Without using transformation, the forecast could reflect the fact that the bigger values have bigger variation and it was useful at forecasting at the time.
ETS forecasting after STL decomposition ‘without’ Box-Cox transformation yielded better result than when ETS(A, Ad, M) or ETS(A, A, M) was used. But the method also couldn’t beat seasonal naive method.

Data 624 - Homework 5

Ohannes (Hovig) Ohannessian

3/7/2019

Chapter 7 - Exponential smoothing