7.8 Exercises

7.8.5 Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.

a. Plot the series and discuss the main features of the data.

From general series plot, we know that there are conflicts of sales - when Paperback sells well, Hardcover sales struggle and vice versa - at some points. The daily trend is generally increasing for both books and by the end, we notice that Hardcover sale is almost always bigger than Paperbook’s which is surprising given that the sale of Paperbook at day 1 was much smaller than Hardcover’s - it seems like there is a substitution effect from Paperback to Hardcover, people buy Hardcover instead of Paperback.

For a white noise series, we expect all of the spikes in the ACF to lie within dashed blue lines. From ggACF, we see that 2 spikes are outside the boundaries and other spikes lie within the bounds; in this case, T = 30 and so the bounds are at +- 2/sqrt(30) = +- 0.37.

Since the data is not white noise, non-stationary time series, we can derive trend or seasonality. I would like to perform STL decomposition with weekly data (transformed with frequency = 7) to examine the features of the data. We see the trend (7 days, weekly) is increasing for both Paperback and Harcover and for seasonal pattern (7 days, weekly), we see that on day 1 and 6, there is a peak for both and then on 3 and 4, sales for both drop dramatically.

From summary statistics, we know that Hardcover books are generally selling more overall - both mean and median are higher.

# Daily pattern - general series
autoplot(books) + 
  ylab("Number of book sales") + xlab("Day")

# Daily pattern - sub series
autoplot(books, facets=TRUE) + 
  ylab("Number of book sales") + xlab("Day")

# Weekly pattern
ts(books[, 'Paperback'], frequency = 7) %>%
  stl(s.window = 'periodic', robust = TRUE) %>%
  autoplot()

ts(books[, 'Hardcover'], frequency = 7) %>%
  stl(s.window = 'periodic', robust = TRUE) %>%
  autoplot()

#gglagplot(ts(books[, 'Paperback'], frequency = 7))
#gglagplot(ts(books[, 'Hardcover'], frequency = 7))

# Autocorrelation for each book
ggAcf(books[, 'Paperback'])

ggAcf(books[, 'Hardcover'])

# summary statistics
summary(books)
##    Paperback       Hardcover    
##  Min.   :111.0   Min.   :128.0  
##  1st Qu.:167.2   1st Qu.:170.5  
##  Median :189.0   Median :200.5  
##  Mean   :186.4   Mean   :198.8  
##  3rd Qu.:207.2   3rd Qu.:222.0  
##  Max.   :247.0   Max.   :283.0

b. Use the ses() function to forecast each series, and plot the forecasts.

As we can see, since SES is suitable for data with no clear trend and seasonal pattern, we have flat forecast.

# Estimate parameters
fc_paper <- ses(books[, 'Paperback'], h=5)
fc_hard <- ses(books[, 'Hardcover'], h=5)

autoplot(fc_paper) +
  autolayer(fitted(fc_paper), series="Fitted") +
  ylab("Number of paperbook sales") + xlab("Day")

autoplot(fc_hard) +
  autolayer(fitted(fc_hard), series="Fitted") +
  ylab("Number of hardbook sales") + xlab("Day")

# c. Compute the RMSE values for the training data in each case.

RMSE for Paperbook (training set) is higher than the one for Hardbook (training set).

#accuracy(fitted(fc_paper), books[, 'Paperback'])
accuracy(fc_paper)
##                    ME     RMSE     MAE       MPE     MAPE      MASE
## Training set 7.175981 33.63769 27.8431 0.4736071 15.57784 0.7021303
##                    ACF1
## Training set -0.2117522
accuracy(fc_hard)
##                    ME     RMSE      MAE      MPE     MAPE      MASE
## Training set 9.166735 31.93101 26.77319 2.636189 13.39487 0.7987887
##                    ACF1
## Training set -0.1417763

7.8.6

a. Now apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.

As we expected, since Holt’s linear method is linear, we see the linear trend forecasting line.

fc_paper_hl <- holt(books[, 'Paperback'], h=4)
fc_hard_hl <- holt(books[, 'Hardcover'], h=4)

autoplot(fc_paper_hl) +
  autolayer(fitted(fc_paper_hl), series="Fitted") +
  ylab("Number of paperbook sales") + xlab("Day")

autoplot(fc_hard_hl) +
  autolayer(fitted(fc_hard_hl), series="Fitted") +
  ylab("Number of hardbook sales") + xlab("Day")

accuracy(fc_paper_hl)
##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -3.717178 31.13692 26.18083 -5.508526 15.58354 0.6602122
##                    ACF1
## Training set -0.1750792
accuracy(fc_hard_hl)
##                      ME     RMSE      MAE       MPE    MAPE      MASE
## Training set -0.1357882 27.19358 23.15557 -2.114792 12.1626 0.6908555
##                     ACF1
## Training set -0.03245186

b. Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.

RMSEs for Holt’s are lower than SES’s. Since we know data is not white noise, Holt’s is suitable for these datasets. Holt’s is the extended version of SES to allow the forecasting of data with a trend. SES is suitable for data without a trend or seasonality.

# Let's compare RMSE on training set between Hort Linear and SES

# Hort's
print(accuracy(fc_paper_hl))
##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -3.717178 31.13692 26.18083 -5.508526 15.58354 0.6602122
##                    ACF1
## Training set -0.1750792
print(accuracy(fc_hard_hl))
##                      ME     RMSE      MAE       MPE    MAPE      MASE
## Training set -0.1357882 27.19358 23.15557 -2.114792 12.1626 0.6908555
##                     ACF1
## Training set -0.03245186
# SES
print(accuracy(fc_paper))
##                    ME     RMSE     MAE       MPE     MAPE      MASE
## Training set 7.175981 33.63769 27.8431 0.4736071 15.57784 0.7021303
##                    ACF1
## Training set -0.2117522
print(accuracy(fc_hard))
##                    ME     RMSE      MAE      MPE     MAPE      MASE
## Training set 9.166735 31.93101 26.77319 2.636189 13.39487 0.7987887
##                    ACF1
## Training set -0.1417763

c. Compare the forecasts for the two series using both methods. Which do you think is best?

One might think that Holt’s method is better than SES since it does provide trend. I would choose Holt’s for these datasets as they are not stationary.

autoplot(fc_paper_hl) +
  autolayer(fitted(fc_paper_hl), series="Fitted") +
  ylab("Number of paperbook sales") + xlab("Day")

autoplot(fc_hard_hl) +
  autolayer(fitted(fc_hard_hl), series="Fitted") +
  ylab("Number of hardbook sales") + xlab("Day")

# d. Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.

As expected, PIs are all different. Since SES is flat forecast and has higher RMSE, it does have wider PI than Holt’s.

# Hort's
print(forecast(fc_paper_hl))[1,]
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       209.4668 166.6035 252.3301 143.9130 275.0205
## 32       210.7177 167.8544 253.5811 145.1640 276.2715
## 33       211.9687 169.1054 254.8320 146.4149 277.5225
## 34       213.2197 170.3564 256.0830 147.6659 278.7735
##    Point Forecast    Lo 80    Hi 80   Lo 95    Hi 95
## 31       209.4668 166.6035 252.3301 143.913 275.0205
holt(books[, 'Paperback'], h=1)$mean[1] - (1.96 * accuracy(fc_paper_hl)[2])
## [1] 148.4384
holt(books[, 'Paperback'], h=1)$mean[1] + (1.96 * accuracy(fc_paper_hl)[2])
## [1] 270.4951
print(forecast(fc_hard_hl))[1,]
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       250.1739 212.7390 287.6087 192.9222 307.4256
## 32       253.4765 216.0416 290.9113 196.2248 310.7282
## 33       256.7791 219.3442 294.2140 199.5274 314.0308
## 34       260.0817 222.6468 297.5166 202.8300 317.3334
##    Point Forecast   Lo 80    Hi 80    Lo 95    Hi 95
## 31       250.1739 212.739 287.6087 192.9222 307.4256
holt(books[, 'Hardcover'], h=1)$mean[1] - (1.96 * accuracy(fc_paper_hl)[2])
## [1] 189.1455
holt(books[, 'Hardcover'], h=1)$mean[1] + (1.96 * accuracy(fc_paper_hl)[2])
## [1] 311.2022
# SES
print(forecast(fc_paper))[1,]
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       207.1097 162.4882 251.7311 138.8670 275.3523
## 32       207.1097 161.8589 252.3604 137.9046 276.3147
## 33       207.1097 161.2382 252.9811 136.9554 277.2639
## 34       207.1097 160.6259 253.5935 136.0188 278.2005
## 35       207.1097 160.0215 254.1979 135.0945 279.1249
##    Point Forecast    Lo 80    Hi 80   Lo 95    Hi 95
## 31       207.1097 162.4882 251.7311 138.867 275.3523
ses(books[, 'Paperback'], h=1)$mean[1] - (1.96 * accuracy(fc_paper_hl)[2])
## [1] 146.0813
ses(books[, 'Paperback'], h=1)$mean[1] + (1.96 * accuracy(fc_paper_hl)[2])
## [1] 268.138
print(forecast(fc_hard))[1,]
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       239.5601 197.2026 281.9176 174.7799 304.3403
## 32       239.5601 194.9788 284.1414 171.3788 307.7414
## 33       239.5601 192.8607 286.2595 168.1396 310.9806
## 34       239.5601 190.8347 288.2855 165.0410 314.0792
## 35       239.5601 188.8895 290.2306 162.0662 317.0540
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       239.5601 197.2026 281.9176 174.7799 304.3403
ses(books[, 'Hardcover'], h=1)$mean[1] - (1.96 * accuracy(fc_paper_hl)[2])
## [1] 178.5317
ses(books[, 'Hardcover'], h=1)$mean[1] + (1.96 * accuracy(fc_paper_hl)[2])
## [1] 300.5885

10. For this exercise use data set ukcars, the quarterly UK passenger vehicle production data from 1977Q1-2005Q1.

a.Plot the data and describe the main features of the series.

Strong seasonality every year (every 4 lags) from ACF, STL decomposition and Lag plot. We see an increasing trend in the data after 1980 (before, it was decreasing). From seasonal subseries plot, we know that Q.1s generally have the highest car production where as Q.3s are the lowest. The data is not white noise and stationary - every spike in every lag in ACF. Summary statistics shows that mean is almost the same as median, indicating that data is fairly normalized.

# general plot
autoplot(ukcars) +
  ylab("Car production") + xlab("Quarters")

# STL decomposition
ukcars %>%
  stl(s.window = 'periodic', robust = TRUE) %>%
  autoplot()

# Lag plot
gglagplot(ukcars)

# Autocorrelation
ggAcf(ukcars)

# seasonal plot
ggseasonplot(ukcars, year.labels=TRUE, year.labels.left=TRUE) + 
  ylab("Car production") +
  ggtitle("Seasonal plot: UK car product")

ggseasonplot(ukcars, polar=TRUE) + 
  ylab("Car production") +
  ggtitle("Seasonal plot: UK car product")

# seasonal subseries plot
ggsubseriesplot(ukcars) +
  ylab("Car production") +
  ggtitle("Seasonal subseries plot: UK car product")

# summary statistics
summary(ukcars)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   171.2   267.0   335.1   333.5   391.8   494.3

b. Decompose the series using STL and obtain the seasonally adjusted data.

Note that seasonally adj. data is obtained as original data - seasonality.

# fit normal data
fit <- stl(ukcars, s.window="periodic",
  robust=TRUE)

#q.2 1977 -- 371.051 - 21.8574881
# fit only seasonaly adj. data
fit_sea <- seasadj(fit)
head(fit_sea)
##          Qtr1     Qtr2     Qtr3     Qtr4
## 1977 306.9244 349.1935 315.8259 344.0282
## 1978 335.0444 340.9645

c. Forecast the next two years of the series using an additive damped trend method applied to the seasonally adjusted data. (This can be done in one step using stlf() with arguments etsmodel=“AAN”, damped=TRUE.)

As data is seasonally adjusted, we use error = A, trend = A and seasonal = N model. Damped is usually better in forecasting values in long run.

fcast <- stlf(fit_sea, etsmodel = "AAN", damped=TRUE, h = 2)
fcast
##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005 Q2       393.1714 362.6002 423.7426 346.4167 439.9261
## 2005 Q3       409.4102 372.0259 446.7945 352.2358 466.5846
# general plot
autoplot(fcast) +
  ylab("Car production") + xlab("Quarters")

# d. Forecast the next two years of the series using Holt’s linear method applied to the seasonally adjusted data (as before but with damped=FALSE).

As data is seasonally adjusted, we use error = A, trend = A and seasonality = N model. Without damped, note that point forecasts are a little bit higher than with damped.

fcast_nodamp <- stlf(fit_sea, etsmodel = "AAN", damped=FALSE, h = 2)
fcast_nodamp
##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005 Q2       394.3341 363.9375 424.7306 347.8465 440.8216
## 2005 Q3       411.4903 373.9466 449.0339 354.0722 468.9083
# general plot
autoplot(fcast_nodamp) +
  ylab("Car production") + xlab("Quarters")

# e. Now use ets() to choose a seasonal model for the data.

Note that ets() automatically chooses the best model by minimizing AIC. The result shows that ANA is the most preferred model. (error = A, trend = N, seasonal = A)

# ETS models - ANA
ets_model <- ets(ukcars)
summary(ets_model)
## ETS(A,N,A) 
## 
## Call:
##  ets(y = ukcars) 
## 
##   Smoothing parameters:
##     alpha = 0.6199 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 314.2568 
##     s = -1.7579 -44.9601 21.1956 25.5223
## 
##   sigma:  25.9302
## 
##      AIC     AICc      BIC 
## 1277.752 1278.819 1296.844 
## 
## Training set error measures:
##                    ME     RMSE      MAE        MPE     MAPE      MASE
## Training set 1.313884 25.23244 20.17907 -0.1570979 6.629003 0.6576259
##                    ACF1
## Training set 0.02573334

f. Compare the RMSE of the ETS model with the RMSE of the models you obtained using STL decompositions. Which gives the better in-sample fits?

For RMSE on entire set, STL model - ANN with no demped gives us the best result (the lowest RMSE).

# STL model - AAN with demped (RMSE on entire set) - 23.32
summary(fcast)
## 
## Forecast method: STL +  ETS(A,Ad,N)
## 
## Model Information:
## ETS(A,Ad,N) 
## 
## Call:
##  ets(y = x, model = etsmodel, damped = TRUE, allow.multiplicative.trend = allow.multiplicative.trend) 
## 
##   Smoothing parameters:
##     alpha = 0.7037 
##     beta  = 1e-04 
##     phi   = 0.9243 
## 
##   Initial states:
##     l = 341.6239 
##     b = -5.0799 
## 
##   sigma:  23.8549
## 
##      AIC     AICc      BIC 
## 1257.950 1258.743 1274.314 
## 
## Error measures:
##                    ME     RMSE      MAE        MPE     MAPE     MASE
## Training set 1.551267 23.32113 18.48987 0.07585736 5.955086 0.602576
##                    ACF1
## Training set 0.02262668
## 
## Forecasts:
##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005 Q2       393.1714 362.6002 423.7426 346.4167 439.9261
## 2005 Q3       409.4102 372.0259 446.7945 352.2358 466.5846
# STL model - AAN with no demped (RMSE on entire set) - 23.3
summary(fcast_nodamp)
## 
## Forecast method: STL +  ETS(A,A,N)
## 
## Model Information:
## ETS(A,A,N) 
## 
## Call:
##  ets(y = x, model = etsmodel, damped = FALSE, allow.multiplicative.trend = allow.multiplicative.trend) 
## 
##   Smoothing parameters:
##     alpha = 0.7248 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 328.1086 
##     b = 0.9202 
## 
##   sigma:  23.7186
## 
##      AIC     AICc      BIC 
## 1255.697 1256.258 1269.334 
## 
## Error measures:
##                      ME   RMSE     MAE        MPE     MAPE      MASE
## Training set -0.3412727 23.295 18.1605 -0.5520141 5.884617 0.5918418
##                    ACF1
## Training set 0.02103582
## 
## Forecasts:
##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005 Q2       394.3341 363.9375 424.7306 347.8465 440.8216
## 2005 Q3       411.4903 373.9466 449.0339 354.0722 468.9083
# ETS model - ANA (RMSE on entire set) - 25.23
summary(ets_model)
## ETS(A,N,A) 
## 
## Call:
##  ets(y = ukcars) 
## 
##   Smoothing parameters:
##     alpha = 0.6199 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 314.2568 
##     s = -1.7579 -44.9601 21.1956 25.5223
## 
##   sigma:  25.9302
## 
##      AIC     AICc      BIC 
## 1277.752 1278.819 1296.844 
## 
## Training set error measures:
##                    ME     RMSE      MAE        MPE     MAPE      MASE
## Training set 1.313884 25.23244 20.17907 -0.1570979 6.629003 0.6576259
##                    ACF1
## Training set 0.02573334

g. Compare the forecasts from the three approaches? Which seems most reasonable?

For RMSE on testset, STL with demped is the most reasonable - the lowest RMSE.

# MSE for STL with demped
e <- tsCV(fit_sea, stlf, etsmodel = "AAN", damped=TRUE, h=2)

# RMSE on testset - STL with demped
sqrt(mean(e^2, na.rm=TRUE))
## [1] 29.76946
#> [1] 29.77

# MSE for STL with no demped
e2 <- tsCV(fit_sea, stlf, etsmodel = "AAN", damped=FALSE, h=2)

# RMSE on testset - STL with no demped
sqrt(mean(e2^2, na.rm=TRUE))
## [1] 30.5195
#> [1] 30.52

# MSE for STL with no demped
f  <- function(y, h) {
  forecast(ets(y), h = h)
}

e3 <- tsCV(ukcars, f, h=2)

# RMSE on testset - est with ANA
sqrt(mean(e3^2, na.rm=TRUE))
## [1] 34.4618
#> [1] 34.4618


# RMSE on all dataset - For STL with demped
#sqrt(mean(residuals(stlf(fit_sea, etsmodel = "AAN", damped=TRUE, h = 2))^2, na.rm=TRUE))

h. Check the residuals of your preferred model.

since p value < 0.05 and spikes appear in ACF plots, we reject null hypothesis that residuals are uncorrelated (null hypothesis of independence is rejected - the residuals are correlated). From the Normality plot, the plot sugests that residuals are fairly normally distributed (residuals have mean fairly close to zero).

Therefore, we conclude that my preferred model does not forecast correctly.

checkresiduals(fcast)
## Warning in checkresiduals(fcast): The fitted degrees of freedom is based on
## the model used for the seasonally adjusted data.

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  ETS(A,Ad,N)
## Q* = 24.138, df = 3, p-value = 2.337e-05
## 
## Model df: 5.   Total lags used: 8
#checkresiduals(stlf(fit_sea, etsmodel = "AAN", damped=TRUE))