DATA 624 Homework 5

library(knitr)
library(rmdformats)

## Global options
options(max.print="31")
opts_chunk$set(
                 cache=TRUE,
               prompt=FALSE,
               tidy=TRUE,
               comment=NA,
               message=FALSE,
               warning=FALSE)
opts_knit$set(width=31)

library(fma)

## Loading required package: forecast

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(ggplot2)

7.8.1

Consider the pigs series — the number of pigs slaughtered in Victoria each month.

(a)

Use the ses() function in R to find the optimal values of α and ℓ₀, and generate forecasts for the next four months.

Ans: The optimal alpha (α) is .2971 and level at time 0 (ℓ₀) is 77260.0561.

str(pigs)

 Time-Series [1:188] from 1980 to 1996: 76378 71947 33873 96428 105084 ...

head(pigs)

        Jan    Feb    Mar    Apr    May    Jun
1980  76378  71947  33873  96428 105084  95741

tail(pigs, 5)

        Apr    May    Jun    Jul    Aug
1995  84307 114896 106749  87892 100506

# Time series display of pigs
tsdisplay(pigs)

fc = ses(pigs, h = 4)
print(fc$model)

Simple exponential smoothing 

Call:
 ses(y = pigs, h = 4) 

  Smoothing parameters:
    alpha = 0.2971 

  Initial states:
    l = 77260.0561 

  sigma:  10308.58

     AIC     AICc      BIC 
4462.955 4463.086 4472.665

print(fc)

         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
Sep 1995       98816.41 85605.43 112027.4 78611.97 119020.8
Oct 1995       98816.41 85034.52 112598.3 77738.83 119894.0
Nov 1995       98816.41 84486.34 113146.5 76900.46 120732.4
Dec 1995       98816.41 83958.37 113674.4 76092.99 121539.8

# adding a continuous scale for x to override the original scale is key!
autoplot(fc) + autolayer(fitted(fc), series = "Fitted") + ylab("Number of pigs slaughtered") + 
    xlab("Year") + scale_x_continuous(breaks = seq(1980, 1996))

(b)

Compute a 95% prediction interval for the first forecast using \(\widehat{y}\)±1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

Ans: Let the formula \(\widehat{y}\)±1.96s be (1).

print("Interval produced by R")

[1] "Interval produced by R"

paste("95% prediction interval for the first forecast is: [", round(fc$lower[1, "95%"], 
    2), ", ", round(fc$upper[1, "95%"], 2), "]")

[1] "95% prediction interval for the first forecast is: [ 78611.97 ,  119020.84 ]"

print("Interval calculated using the formula (1) ")

[1] "Interval calculated using the formula (1) "

paste("95% prediction interval for the first forecast is: [", round(fc$mean[1] - 
    1.96 * sd(fc$residuals), 2), ", ", round(fc$mean[1] + 1.96 * sd(fc$residuals), 
    2), "]")

[1] "95% prediction interval for the first forecast is: [ 78679.97 ,  118952.84 ]"

Apparently, there are some differences between the 2 intervals produced. Intervals produced by R seemed to be encompassing the one calculated with formula (1).

7.8.5

Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.

(a)

Plot the series and discuss the main features of the data.

autoplot(books) + ggtitle("Daily sales of paperback and hardcover books at the same store")

tsdisplay(books)

Ans: both paperback and hardcover shows upward trend. But neither shows a seasonal or cyclical pattern.

(b)

Use the ses() function to forecast each series, and plot the forecasts.

ses_pb <- ses(books[, "Paperback"], h = 4)
ses_hc <- ses(books[, "Hardcover"], h = 4)

plot1 <- autoplot(books[, "Paperback"], series = "Paperback") + autolayer(ses_pb, 
    series = "Paperback", PI = T)
plot2 <- autoplot(books[, "Hardcover"], series = "Hardcover", colour = "#00B9E3") + 
    autolayer(ses_hc, series = "Hardcover", PI = T, colour = "#00B9E3")
plot1 + ylab("Sales Amount") + ggtitle("Daily Same-store Sales of Paperback Books")

plot2 + ylab("Sales Amount") + ggtitle("Daily Same-store Sales of Hardcover Books") + 
    geom_line(colour = "#00B9E3")

(c)

Compute the RMSE values for the training data in each case.

text1 <- paste("RMSE for Paperback book sales is:", round(accuracy(ses_pb)[2], 2))
text2 <- paste("RMSE for Hardcover book sales is:", round(accuracy(ses_hc)[2], 2))
cat(text1, text2, sep = "\n")

RMSE for Paperback book sales is: 33.64
RMSE for Hardcover book sales is: 31.93

7.8.6

(a)

Apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.

fc_pb_h <- holt(books[, "Paperback"], h = 4)
fc_hc_h <- holt(books[, "Hardcover"], h = 4)

autoplot(books[, "Paperback"], series = "Paperback") + autolayer(fc_pb_h, series = "Paperback", 
    PI = F) + autolayer(books[, "Hardcover"], series = "Hardcover") + autolayer(fc_hc_h, 
    series = "Hardcover", PI = F) + ylab("Sales Amount") + ggtitle("Daily Same-store Sales of Paperback and Hardcover Books (Holt's)")

(b)

Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.

acc_h1 <- paste("RMSE for Holt's method for Paperback book sales is:", round(accuracy(fc_pb_h)[2], 
    2))
acc_h2 <- paste("RMSE for Holt's method for Hardcover book sales is:", round(accuracy(fc_hc_h)[2], 
    2))
cat(acc_h1, acc_h2, sep = "\n")

RMSE for Holt's method for Paperback book sales is: 31.14
RMSE for Holt's method for Hardcover book sales is: 27.19

The RMSE for these 2 times series improved using Holt’s method. Holt’s method does have 2 parameters, level and trend while SES only has one in level. The merit of Holt’s method is when it comes to series with increasing trend as it minimizes the training RMSE. In the case where there isn’t a clear trend, a simple exponential smoothing (SES) suffices. And the merit here is it does the job and is simple enough to explain.

(c)

Compare the forecasts for the two series using both methods. Which do you think is best?

# SES - paperback
accuracy(ses_pb)

                   ME     RMSE     MAE       MPE     MAPE      MASE       ACF1
Training set 7.175981 33.63769 27.8431 0.4736071 15.57784 0.7021303 -0.2117522

# SES - hard cover
accuracy(ses_hc)

                   ME     RMSE      MAE      MPE     MAPE      MASE       ACF1
Training set 9.166735 31.93101 26.77319 2.636189 13.39487 0.7987887 -0.1417763

# Holt's - paper back
accuracy(fc_pb_h)

                    ME     RMSE      MAE       MPE     MAPE      MASE
Training set -3.717178 31.13692 26.18083 -5.508526 15.58354 0.6602122
                   ACF1
Training set -0.1750792

# Holt's - hard cover
accuracy(fc_hc_h)

                     ME     RMSE      MAE       MPE    MAPE      MASE
Training set -0.1357882 27.19358 23.15557 -2.114792 12.1626 0.6908555
                    ACF1
Training set -0.03245186

Ans: As you can see both the RMSE and MAE, Holt’s method turns out to score better (minimized errors) for both series. There is no doubt that Holt’s method is the best, esp. for the hard cover series.

(d)

Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.

# PB

# Self calculate (1)
round(fc_pb_h$mean[1] - 1.96 * accuracy(fc_pb_h)[2], 2)

[1] 148.44

# Self calculate (1)
round(fc_pb_h$mean[1] + 1.96 * accuracy(fc_pb_h)[2], 2)

[1] 270.5

# Holt's
fc_pb_h$lower[1, "95%"]

    95% 
143.913

fc_pb_h$upper[1, "95%"]

     95% 
275.0205

# Self calculate (2)
round(ses_pb$mean[1] - 1.96 * accuracy(ses_pb)[2], 2)

[1] 141.18

# Self calculate (2)
round(ses_pb$mean[1] + 1.96 * accuracy(ses_pb)[2], 2)

[1] 273.04

# SES
ses_pb$lower[1, "95%"]

    95% 
138.867

ses_pb$upper[1, "95%"]

     95% 
275.3523

# HC Self calculate (1)
round(fc_hc_h$mean[1] - 1.96 * accuracy(fc_hc_h)[2], 2)

[1] 196.87

# Self calculate (1)
round(fc_hc_h$mean[1] + 1.96 * accuracy(fc_hc_h)[2], 2)

[1] 303.47

# Holt's
fc_hc_h$lower[1, "95%"]

     95% 
192.9222

fc_hc_h$upper[1, "95%"]

     95% 
307.4256

# Self calculate (2)
round(ses_hc$mean[1] - 1.96 * accuracy(ses_hc)[2], 2)

[1] 176.98

# Self calculate (2)
round(ses_hc$mean[1] + 1.96 * accuracy(ses_hc)[2], 2)

[1] 302.14

# SES
ses_hc$lower[1, "95%"]

     95% 
174.7799

ses_hc$upper[1, "95%"]

     95% 
304.3403

Ans:

Definitions:

Formula (1) \(\widehat{y_{h}}±1.96s_{h}\)

Formula (2) \(\widehat{y_{ses}}±1.96s_{ses}\)

Paperback Series	Lower Bound	Upper Bound
Calculated using formula (1)	148.44	270.50
Holt’s	`143.91`	`275.02`
Calculated using formula (2)	`141.18`	`273.04`
SES	`138.87`	`275.35`

Hardcover Series	Lower Bound	Upper Bound
Calculated using formula (1)	`196.87`	`303.47`
Holt’s	`192.92`	`307.42`
Calculated using formula (2)	`176.98`	`302.14`
SES	`174.78`	`174.77`

7.8.7

For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900–1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.

Which model gives the best RMSE?

str(eggs)

 Time-Series [1:94] from 1900 to 1993: 277 315 315 321 315 ...

head(eggs)

Time Series:
Start = 1900 
End = 1905 
Frequency = 1 
[1] 276.79 315.42 314.87 321.25 314.54 317.92

tail(eggs)

Time Series:
Start = 1988 
End = 1993 
Frequency = 1 
[1] 64.55 80.36 79.79 74.79 64.86 62.27

fc.holt <- holt(eggs, h = 100)
fc.holt.damped <- holt(eggs, damped = TRUE, h = 100)
fc.holt.boxcox <- holt(eggs, lambda = BoxCox.lambda(eggs), h = 100)
fc.holt.boxcox.damped <- holt(eggs, lambda = BoxCox.lambda(eggs), damped = TRUE, 
    h = 100)

autoplot(eggs) + autolayer(fc.holt, series = "4. Holt's method", PI = FALSE) + autolayer(fc.holt.damped, 
    series = paste("1. Holt's Method ", "\n", "+ damped trend"), PI = FALSE) + autolayer(fc.holt.boxcox, 
    series = paste("3. Holt's Method ", "\n", "+ BoxCox transformation"), PI = FALSE) + 
    autolayer(fc.holt.boxcox.damped, series = paste("2. Holt's Method ", "\n", "+ damped trend ", 
        "\n", "+ BoxCox transformation"), PI = FALSE) + ggtitle("Forecasts from Various Options in Holt's Method") + 
    xlab("Year") + ylab("Eggs") + guides(colour = guide_legend(title = "Forecasting Options")) + 
    theme(legend.text = element_text(size = 7.1))

Ans:

# 1
paste("RMSE for Holt's Method w/ damped trend:", round(accuracy(fc.holt.damped)[2], 
    2))

[1] "RMSE for Holt's Method w/ damped trend: 26.54"

# 2
paste("RMSE for Holt's Method w/ damped trend and w/ BoxCox transformation:", round(accuracy(fc.holt.boxcox.damped)[2], 
    2))

[1] "RMSE for Holt's Method w/ damped trend and w/ BoxCox transformation: 26.53"

# 3
paste("RMSE for Holt's Method w/ BoxCox transformation:", round(accuracy(fc.holt.boxcox)[2], 
    2))

[1] "RMSE for Holt's Method w/ BoxCox transformation: 26.39"

# 4
paste("RMSE for Holt's Method:", round(accuracy(fc.holt)[2], 2))

[1] "RMSE for Holt's Method: 26.58"

W.r.t. just RMSE, which is the de facto accuracy metric for linear estimations, I declare that Holt’s Method with a BoxCox transformation alone proves to be the most accurate option of the Holt’s Method. It provided good point estimate forecasts. Moreover, it seems the smoothing effect of BoxCox is already sufficient. Putting it alongside with damping option is going to overdo the forecast. What the damping option really does is to dampen the trend into a flat line. The forecasts above does show that damping option did too much of an adverse effect to the series such that it completely removed the trend and thus making the minimization of the errors (via RMSE) not as ideal as one would like.

7.8.8

Recall your retail time series data (from Exercise 3 in Section 2.10).

(a)

Why is multiplicative seasonality necessary for this series?

retaildata <- readxl::read_excel("retail.xlsx", skip = 1)
myts <- ts(retaildata[, "A3349398A"], frequency = 12, start = c(1982, 4))
autoplot(myts) + ylab("Retail Food Sales") + ggtitle("New South Wales - Food Sales")

tsdisplay(myts)

Ans: As you can see that the magnitude of the seasonal effect increases over time, that’s the reason why a multiplicative seasonality is necessary.

(b)

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

fc_mult.hw <- hw(myts, h = 120, seasonal = "multiplicative")
fc_mult.hw.damped <- hw(myts, h = 120, seasonal = "multiplicative", damped = TRUE)

autoplot(myts) + autolayer(fc_mult.hw, series = "Holt multi", PI = FALSE) + autolayer(fc_mult.hw.damped, 
    series = "Holt multi damped", PI = FALSE) + ggtitle("Multiplicative seasonal forecast") + 
    xlab("Year") + ylab("Retail Food Sales")

(c)

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

fc_mult.hw1_mse <- tsCV(myts, hw, h = 1, seasonal = "multiplicative")
fc_mult.hw1.damped_mse <- tsCV(myts, hw, h = 1, seasonal = "multiplicative", damped = TRUE)

paste("RMSE of Holt-Winters’ seaonal multiplicative method is:", round(sqrt(mean(fc_mult.hw1_mse^2, 
    na.rm = TRUE)), 2))

[1] "RMSE of Holt-Winters’ seaonal multiplicative method is: 33.04"

paste("RMSE of Holt-Winters’ seaonal multiplicative damped method is:", round(sqrt(mean(fc_mult.hw1.damped_mse^2, 
    na.rm = TRUE)), 2))

[1] "RMSE of Holt-Winters’ seaonal multiplicative damped method is: 33.45"

I do prefer Holt-Winters’ seaonal multiplicative method.

(d)

Check that the residuals from the best method look like white noise.

# checkresiduals(fc_mult.hw1_mse)
ggAcf(fc_mult.hw1_mse)

Ans:

As there are more than 5% of the spikes in the ACF plot are outside of the blue dotted lines, the series is probably not white noise.

(e)

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naive approach from Exercise 7 in Section 3.7?

# tail(myts)

myts.train <- window(myts, end = c(2010, 12))
myts.test <- window(myts, start = 2011)

fc.hw.damped <- hw(myts.train, seasonal = "multiplicative", damped = TRUE)
fc.seasonal.naive <- snaive(myts.train)

method1 <- paste("RMSE for Holt-Winter's Damped Method =", round(accuracy(fc.hw.damped, 
    myts.test)[2], 2))
method2 <- paste("RMSE for Seasonal Naive Method = ", round(accuracy(fc.seasonal.naive, 
    myts.test)[2], 2))
cat(method1, method2, sep = "\n")

RMSE for Holt-Winter's Damped Method = 8.2
RMSE for Seasonal Naive Method =  115

Ans: Yes. I am able to beat seasonal naive method by a factor of 14 times. My RMSE for Holt-Winter’s Damped method is 8.2.

7.8.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

# STL - Seasonal Decomposition of Time Series by Loess
lambda = BoxCox.lambda(myts.train)
fc.stl = stlf(myts.train, lambda = lambda)
# ETS - Exponential smoothing state space model
fc.ets = ets(seasadj(decompose(myts.train, "multiplicative")), model = "ZZZ", damped = NULL, 
    alpha = NULL, beta = NULL, gamma = NULL, phi = NULL, lambda = lambda, biasadj = FALSE, 
    additive.only = FALSE, restrict = TRUE, allow.multiplicative.trend = TRUE)

autoplot(myts.train, series = "train") + autolayer(forecast(fc.stl, h = 24, PI = F), 
    series = paste("STL Forecast ", "\n", "+ Box-Cox", "\n", " transformation")) + 
    autolayer(forecast(fc.ets, h = 24, PI = F), series = "ETS Forcast") + autolayer(myts.test, 
    series = "test")

method_stl <- paste("RMSE for STL Method is:", round(accuracy(fc.stl, myts.test)[2], 
    2))
method_ets <- paste("RMSE for ETS Method is:", round(sqrt(fc.ets$mse), 2))
cat(method_stl, method_ets, sep = "\n")

RMSE for STL Method is: -101.37
RMSE for ETS Method is: 0.05

Ans: STL decomposition method applied w/ a Box-Cox transformed series got outperformed the ETS method on seasonally-adjusted series, by a large margin, in terms of RMSE 101.37 to 0.05. Holt-Winter’s damped method was at 8.2 in terms of RMSE. So ETS is the go-to method.