Question 1

Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books (data set books).

A) Plot the series and discuss the main features of the data.

library(fpp)
Paperback <- books[,1]
Hardcover <- books[,2]

autoplot.zoo(books)

Both time series show a slight increasing trend and significant daily variation. There does not appear to be any seasonality.


B) Use simple exponential smoothing with the ses function (setting initial=“simple”) and explore different values of alpha for the paperback series. Record the within-sample SSE for the one-step forecasts. Plot SSE against alpha and find which value of alpha works best. What is the effect of alpha on the forecasts?

# Creat a function that returns the within-sample SSEs for a
# sequence of alphas

SSE_by_Alpha_ses <- function(timeseries, alpha_seq){
     
     SSEs <- vector()
     
     for(i in alpha_seq){
          model <- ses(timeseries, initial = 'simple', alpha = i)
          SSEs <- c(SSEs, sum((model$residuals)^2))
     }
     
     out <- data.frame(Alpha = alpha_seq,
                       SSE = SSEs)
     
     return(out)
}

PPB_alphas <- SSE_by_Alpha_ses(Paperback, seq(0.01, 0.99, 0.01))

plot(PPB_alphas, type = 'l', lwd = 2, 
     main = 'Relationship between within-sample SSE and Alpha')

abline(v = PPB_alphas[which.min(PPB_alphas$SSE),1], col = 'red')
abline(h = PPB_alphas[which.min(PPB_alphas$SSE),2], col = 'red')

legend('top',
       legend = 'Minimum SSE (alpha = 0.21, SSE = 36314.59)',
       lty = 1,
       col = 'red')

Alpha controls the degree to which forecasts depend on past information, an alpha closer to zero means more information from the past is incorporated into the forecast, while an alpha close to one means that the most recent observation is primarily used for the forecast. In this case, the within-sample forecast accuracy increased as alpha increased up to 0.21 (the alpha at which the within-sample SSE is at a minimum), but then the forecast accuracy began to decrease as alpha increased past 0.21. Thus, it appears that an alpha of 0.21 works best.


C) Now let ses select the optimal value of alpha. Use this value to generate forecasts for the next four days. Compare your results with 2.

When the question says “Compare your results with 2.” I am assuming it means compare the results of the ses model with the alpha level chosen in part B to ses model where R chooses the optimal value of alpha (but with initial still set to ‘simple’). Also, since the question says forecast the next four days I am assuming it in fact wants forecasts for the next four days, and that it is not asking us to partition the data into a training set of 26 days and a validation set of 4 days.

PPB_ses_aR <- ses(Paperback, initial = 'simple', h = 4)
PPB_ses_0.21 <- ses(Paperback, initial = 'simple', alpha = 0.21, h = 4)

plot(Paperback, type="b",
     ylab="Books Sold", xlab="Day", main="", 
     xlim = c(0,34))

lines(PPB_ses_aR$fitted, type = 'b', col = 'blue', lwd = 4)
lines(PPB_ses_aR$mean, type = 'b', col = 'blue', lwd = 4)

lines(PPB_ses_0.21$fitted, type = 'b', col = 'red', lwd = 2)
lines(PPB_ses_0.21$mean, type = 'b', col = 'red', lwd = 2)

legend('bottomright',
       legend = c('Data', 'Alpha Chosen by R', 'Alpha = 0.21'),
       col = c('black', 'blue', 'red'),
       lty = 1, pch = 1, bty  = 'n')

From the plot above it is clear that the two forecasting models are almost identical. I used thicker lines and circles to plot the forecasts and fitted values for the model with alpha chosen by R, otherwise the plotted values for the models would be idistinguishable. The SSE for within-sample one-step ahead forecasts for the two models are also almost identical, as shown in the table below.

library(pander)
x <- data.frame(SSE = c(sum(PPB_ses_0.21$residuals^2),
                        sum(PPB_ses_aR$residuals^2)))

rownames(x) <- c('Alpha = 0.21', 'Alpha Chosen by R')

pandoc.table(x,
             digits = 8)
  SSE
Alpha = 0.21 36314.586
Alpha Chosen by R 36313.98


D) Repeat but with initial=“optimal”. How much difference does an optimal initial level make?

PPB_ses_opt <- ses(Paperback, initial = 'optimal', h = 4)

plot(Paperback, type="b", lty = 2,
     ylab="Books Sold", xlab="Day", main="", 
     xlim = c(0,34))

lines(PPB_ses_aR$fitted, type = 'b', col = 'blue', lwd = 4)
lines(PPB_ses_aR$mean, type = 'b', col = 'blue', lwd = 4)

lines(PPB_ses_0.21$fitted, type = 'b', col = 'red', lwd = 2)
lines(PPB_ses_0.21$mean, type = 'b', col = 'red', lwd = 2)

lines(PPB_ses_opt$fitted, type = 'b', col = 'green', lwd = 1, pch = 19)
lines(PPB_ses_opt$mean, type = 'b', col = 'green', lwd = 1, pch = 19)

legend('bottomright',
       legend = c('Data', 'Alpha Chosen by R', 'Alpha = 0.21', 'Optimal'),
       col = c('black', 'blue', 'red', 'green'),
       lty = c(2,1,1,1), pch = c(1,1,1,19), bty  = 'n')

The model with intial set to “optimal” produces sales predictions slightly lower than those of the other models. This difference is most noticable in the beginning of the time series, since this is where the intial value for level matters most. The table below reports the SSEs of the within-sample one-step ahead forecasts for the three models. The SSE for the optimized initial state is lower than that of the other models.

x <- data.frame(SSE = c(sum(PPB_ses_0.21$residuals^2),
                        sum(PPB_ses_aR$residuals^2),
                        sum(PPB_ses_opt$residuals^2)))

rownames(x) <- c('Alpha = 0.21', 'Alpha Chosen by R', 'Optimal')

pandoc.table(x,
             digits = 8)
  SSE
Alpha = 0.21 36314.586
Alpha Chosen by R 36313.98
Optimal 33944.819

D) Repeat steps (b)-(d) with the hardcover series.

HC_alphas <- SSE_by_Alpha_ses(Hardcover, seq(0.01, 0.99, 0.01))

plot(HC_alphas, type = 'l', lwd = 2, 
     main = 'Relationship between within-sample SSE and Alpha')

abline(v = HC_alphas[which.min(HC_alphas$SSE),1], col = 'red')
abline(h = HC_alphas[which.min(HC_alphas$SSE),2], col = 'red')

legend('top',
       legend = 'Minimum SSE (alpha = 0.35, SSE = 30758.47)',
       lty = 1,
       col = 'red')

As with the paperback series, the accuraccy increases as alpha increases up to a certain point before decreasing. However, the curve before the optimal alpha is much steeper than that observed for the paperback series. This indicates that accuracy with the training data increases much more rapidly with alpha (up until the optimal alpha of 0.35) than it did in the paperback series. From the tested alphas, an alpha of 0.35 appears to be the best.

HC_ses_aR <- ses(Hardcover, initial = 'simple', h = 4)
HC_ses_0.35 <- ses(Hardcover, initial = 'simple', alpha = 0.35, h = 4)

plot(Hardcover, type="b",
     ylab="Books Sold", xlab="Day", main="", 
     xlim = c(0,34))

lines(HC_ses_aR$fitted, type = 'b', col = 'blue', lwd = 4)
lines(HC_ses_aR$mean, type = 'b', col = 'blue', lwd = 4)

lines(HC_ses_0.35$fitted, type = 'b', col = 'red', lwd = 2)
lines(HC_ses_0.35$mean, type = 'b', col = 'red', lwd = 2)

legend('bottomright',
       legend = c('Data', 'Alpha Chosen by R', 'Alpha = 0.35'),
       col = c('black', 'blue', 'red'),
       lty = 1, pch = 1, bty  = 'n')

As with the paperback series, the forecast model fitted with the manually chosen alpha and the forecast model with alpha chosen by R are almost identical.

HC_ses_opt <- ses(Hardcover, initial = 'optimal', h = 4)

plot(Hardcover, type="b", lty = 2,
     ylab="Books Sold", xlab="Day", main="", 
     xlim = c(0,34))

lines(HC_ses_aR$fitted, type = 'b', col = 'blue', lwd = 4)
lines(HC_ses_aR$mean, type = 'b', col = 'blue', lwd = 4)

lines(HC_ses_0.35$fitted, type = 'b', col = 'red', lwd = 2)
lines(HC_ses_0.35$mean, type = 'b', col = 'red', lwd = 2)

lines(HC_ses_opt$fitted, type = 'b', col = 'green', lwd = 1, pch = 19)
lines(HC_ses_opt$mean, type = 'b', col = 'green', lwd = 1, pch = 19)

legend('bottomright',
       legend = c('Data', 'Alpha Chosen by R', 'Alpha = 0.35', 'Optimal'),
       col = c('black', 'blue', 'red', 'green'),
       lty = c(2,1,1,1), pch = c(1,1,1,19), bty  = 'n')

For this series, the forecasting model with the optimal initial states chosen is very close to the models with the initial states set as the first value in the time series. The primary difference is the forecast for the first few points in the time series. The table below reports the SSE for the within-sample one-step ahead forecasts for each of the models. As with the paperback series, the model with the initial states and alpha chosen optimally has the lowest SSE.

x <- data.frame(SSE = c(sum(HC_ses_0.35$residuals^2),
                        sum(HC_ses_aR$residuals^2),
                        sum(HC_ses_opt$residuals^2)))

rownames(x) <- c('Alpha = 0.35', 'Alpha Chosen by R', 'Optimal')

pandoc.table(x,
             digits = 8)
  SSE
Alpha = 0.35 30758.472
Alpha Chosen by R 30758.067
Optimal 30587.692



Question 2

Apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.

PPB_holt <- holt(Paperback, h = 4)
HC_holt <- holt(Hardcover, h = 4)

par(mfrow = c(2,1))

plot(Paperback, type="o", lty = 2,
     ylab="Books Sold", xlab="Day", main="Paperback", 
     xlim = c(0,34))

lines(PPB_holt$fitted, type = 'o', col = 'blue', lwd = 2)
lines(PPB_holt$mean, type = 'o', col = 'blue', lwd = 2)

legend('bottomright',
       legend = c('Data', 'Holt\'s Linear Forecast'),
       col = c('black', 'blue'),
       lty = c(2,1), pch = 1, bty  = 'n')

plot(Hardcover, type="o", lty = 2,
     ylab="Books Sold", xlab="Day", main="Hardcover", 
     xlim = c(0,34))

lines(HC_holt$fitted, type = 'o', col = 'blue', lwd = 2)
lines(HC_holt$mean, type = 'o', col = 'blue', lwd = 2)

legend('bottomright',
       legend = c('Data', 'Holt\'s Linear Forecast'),
       col = c('black', 'blue'),
       lty = c(2,1), pch = 1, bty  = 'n')


A) Compare the SSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. Discuss the merits of the two forecasting methods for these data sets.

The table below reports the SSEs for the within-sample one-step ahead forecasts for the two series using simple exponential smoothing (SES) and Holt’s linear method. Holt’s linear method is more appropriate for both series since both series exhibit some trend. This is most noticeable for the hardcover series which has a mode pronounced trend than the paperback series.

x <- data.frame(Hardcover = c(sum(HC_ses_opt$residuals^2),
                              sum(HC_holt$residuals^2)),
                Paperback = c(sum(PPB_ses_opt$residuals^2),
                              sum(PPB_holt$residuals^2)))

rownames(x) <- c('SES', 
                 'Holt\'s Linear Method')

pandoc.table(x,
             digits = 8)
  Hardcover Paperback
SES 30587.692 33944.819
Holt’s Linear Method 22581.831 30074.171


B) Compare the forecasts for the two series using both methods. Which do you think is best?

PPB_holt <- holt(Paperback, h = 4)
HC_holt <- holt(Hardcover, h = 4)

par(mfrow = c(2,1))

plot(Paperback, type="o", lty = 2,
     ylab="Books Sold", xlab="Day", main="Paperback", 
     xlim = c(0,34))

lines(PPB_ses_opt$mean, type = 'o', col = 'red', lwd = 2)
lines(PPB_holt$mean, type = 'o', col = 'blue', lwd = 2)

legend('bottomright',
       legend = c('Data', 
                  'Holt\'s Linear Forecast',
                  'SES Forecast'),
       col = c('black', 'blue', 'red'),
       lty = c(2,1,1), pch = 1, bty  = 'n')

plot(Hardcover, type="o", lty = 2,
     ylab="Books Sold", xlab="Day", main="Hardcover", 
     xlim = c(0,34))

lines(HC_ses_opt$mean, type = 'o', col = 'red', lwd = 2)
lines(HC_holt$mean, type = 'o', col = 'blue', lwd = 2)

legend('bottomright',
       legend = c('Data', 
                  'Holt\'s Linear Forecast',
                  'SES'),
       col = c('black', 'blue', 'red'),
       lty = c(2,1,1), pch = 1, bty  = 'n')

The SES and Holt’s forecasts for the paperback series are very similar and both would be acceptable. If I had to chose one of the two I would chose the Holt’s forecast since it captures some of the linear trend. There is more difference between the SES and Holt’s forecast for the hardcover series. For this series I think Holt’s method is best because it includes the increasing trend in the series.


C) Calculate a 95% prediction interval for the first forecast for each series using both methods, assuming normal errors. Compare your forecasts with those produced by R.

If we assume normality we can estimate the standard deviation of the errors by using the sigma2 value, which is reported as part of the object returned by ses() and holt(), to estimate the variance of the error component of the model. For a 95% prediction interval, the upper bound is:

\[ Point Estimate + 1.96*\sqrt{\widehat{Variance}} \]

And the lower bound is:

\[ Point Estimate - 1.96*\sqrt{\widehat{Variance}} \] The 95% prediction intervals for the first forecasts for each series are shown below. Intervals calculated assuming normality and the intervals produced by R are both reported.

PPB_Upper <- c(PPB_holt$mean[1] + 1.96*sqrt(PPB_holt$model$sigma2),
               as.numeric(PPB_holt$upper[1,2]))

PPB_Lower <- c(PPB_holt$mean[1] - 1.96*sqrt(PPB_holt$model$sigma2),
               as.numeric(PPB_holt$lower[1,2]))

HC_Upper <- c(HC_holt$mean[1] + 1.96*sqrt(HC_holt$model$sigma2),
              as.numeric(HC_holt$upper[1,2]))

HC_Lower <- c(HC_holt$mean[1] - 1.96*sqrt(HC_holt$model$sigma2),
              as.numeric(HC_holt$lower[1,2]))

x <- cbind(PPB_Lower, PPB_Upper, HC_Lower, HC_Upper)
rownames(x) <- c('Assuming Normality', 'Produced by R')

pandoc.table(x,
             col.names = c('Paperback Lower Bound',
                           'Paperback Upper Bound',
                           'Hardcover Lower Bound',
                           'Hardcover Upper Bound'),
             caption = '95% prediction intervals for the first forecast for the paperback and hardcover books series. The intervals obtained assuming normality are compared to those produced by R.',
             split.table = Inf,
             digits = 7)
95% prediction intervals for the first forecast for the paperback and hardcover books series. The intervals obtained assuming normality are compared to those produced by R.
  Paperback Lower Bound Paperback Upper Bound Hardcover Lower Bound Hardcover Upper Bound
Assuming Normality 144.9808 269.0952 193.576 301.1247
Produced by R 144.982 269.0941 193.577 301.1237

From the table above, we can see that the intervals constructed assuming normality are almost identical to those produced by R.