R Notebook

DATA 624 - HW 5 Exercises - Zachary Herold

A. Loading Packages

## Loading required package: ggplot2

## Loading required package: forecast

## Loading required package: fma

## Loading required package: expsmooth

1. Victoria Pigs Dataset

Consider the pigs series - the number of pigs slaughtered in Victoria each month (Jan 1980 - Aug 1995).

tsdisplay(pigs)

a. Use the ses() function in R to find the optimal values of and alpha and lambda-naught, and generate forecasts for the next four months.

fc.pigs <- ses(pigs, h=4)
print(fc.pigs$model)

## Simple exponential smoothing 
## 
## Call:
##  ses(y = pigs, h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.2971 
## 
##   Initial states:
##     l = 77260.0561 
## 
##   sigma:  10308.58
## 
##      AIC     AICc      BIC 
## 4462.955 4463.086 4472.665

Checking the SES model parameters, we note alpha (smoothing paramter) of 0.2971 and lambda-naught (initial value) of 77260.0561.

autoplot(fc.pigs) +
  forecast::autolayer(fitted(fc.pigs), series="Fitted") +
  ylab("Slaughtered Pigs/month") + xlab("Year")

b. Compute a 95% prediction interval for the first forecast using y-hat = +- 1.96, where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

m.pigs <- mean(window(pigs, end=c(1995,4)))
print(paste0("The mean of the pigs values calculated from the data set is: ", round(m.pigs,3)))

## [1] "The mean of the pigs values calculated from the data set is: 90382.391"

print(paste0("The mean of the pigs values given by ses() model is:  ", round(fc.pigs$mean[1],3)))

## [1] "The mean of the pigs values given by ses() model is:  98816.406"

s.pigs <- sd(fc.pigs$model$residuals)
print(paste0("The standard deviation of the residuals calculated from residuals is: ", round(s.pigs,3)))

## [1] "The standard deviation of the residuals calculated from residuals is: 10273.693"

print(paste0("The standard deviation of the residuals given by ses() model is:  ", round(fc.pigs$model$sigma2^0.5,3)))

## [1] "The standard deviation of the residuals given by ses() model is:  10308.576"

The ses() model gives the 95% confidence interval for the first forecasted value as: [78611.97, 119020.8], verified below.

fc.pigs[c('upper','lower')]

## $upper
##               80%      95%
## Sep 1995 112027.4 119020.8
## Oct 1995 112598.3 119894.0
## Nov 1995 113146.5 120732.4
## Dec 1995 113674.4 121539.8
## 
## $lower
##               80%      95%
## Sep 1995 85605.43 78611.97
## Oct 1995 85034.52 77738.83
## Nov 1995 84486.34 76900.46
## Dec 1995 83958.37 76092.99

fc.pigs$mean[1] - 1.96 * fc.pigs$model$sigma2^0.5

## [1] 78611.6

fc.pigs$mean[1] + 1.96 * fc.pigs$model$sigma2^0.5

## [1] 119021.2

Calculating the confidence interval by hand gives very different results, with a confidence interval of [70245.95, 110518.8].

m.pigs - 1.96 * s.pigs

## [1] 70245.95

m.pigs + 1.96 * s.pigs

## [1] 110518.8

The difference in these values can likely accounted for by the fact that the mean for the ses() model is also weight-adjusted according to the smoothing parameter.

5. Forecasting Book Sales

Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.

a. Plot the series and discuss the main features of the data.

The first column gives daily sales of paperbacks, while the second is daily sales of hardcovers, for one month only.

data(books)
pb <- window(books[,1], end=26)
hc <- window(books[,2], end=26)

checkresiduals(books[,1])

## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.

checkresiduals(books[,2])

## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.

Applying the ets() function to both the paperback and hardcover time series, we note that the model gives the best approximations as follows: paperback: ETS(A,N,N) with additive error term, no trend nor seasonality hardcover: ETS(M,A,N) with multiplicative error term, additive trend and no seasonality

The trend in the hardcover, but not the paperback, data is supported by the autocorrelation function, showing significance up to two lags.

ets(books[,1])

## ETS(A,N,N) 
## 
## Call:
##  ets(y = books[, 1]) 
## 
##   Smoothing parameters:
##     alpha = 0.1684 
## 
##   Initial states:
##     l = 170.8464 
## 
##   sigma:  34.8183
## 
##      AIC     AICc      BIC 
## 318.9747 319.8978 323.1783

ets(books[,2])

## ETS(M,A,N) 
## 
## Call:
##  ets(y = books[, 2]) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 149.3465 
##     b = 3.2554 
## 
##   sigma:  0.1462
## 
##     AIC    AICc     BIC 
## 309.583 312.083 316.589

b. Use the ses() function to forecast each series, and plot the forecasts.

fc.pb.ses <- ses(pb, h=4)
autoplot(fc.pb.ses) +
  forecast::autolayer(fitted(fc.pb.ses), series="Fitted") +
  ylab("Daily Paperback sales") + xlab("Day of Month")

fc.hc.ses <- ses(hc, h=4)
autoplot(fc.hc.ses) +
  forecast::autolayer(fitted(fc.hc.ses), series="Fitted") +
  ylab("Daily Paperback sales") + xlab("Day of Month")

c. Compute the RMSE values for the training data in each case.

accuracy(fc.pb.ses,books[,1])

##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set  5.740754 34.07320 27.41353 -0.4571133 15.81877 0.6738824
## Test set     26.621870 33.88176 28.56094 11.3497551 12.38117 0.7020879
##                    ACF1 Theil's U
## Training set -0.1616501        NA
## Test set     -0.4716847 0.9208597

The paperback training set has an RMSE of 34.07320, versus 33.88176 for the test data. That the RMSE for the test set is lower than the training set error at least shows that we are not overfitting.

accuracy(fc.hc.ses,books[,2])

##                     ME     RMSE      MAE      MPE     MAPE      MASE
## Training set  7.017366 30.44212 25.53629 1.814926 13.34629 0.8111909
## Test set     24.843828 40.61285 32.29691 8.697859 12.40586 1.0259503
##                     ACF1 Theil's U
## Training set -0.09173184        NA
## Test set     -0.71082349 0.7786365

The hardcover training set RMSE (30.44212) is lower than paperbacks’. The test set RMSE is 40.61285, about 33% higher.

6. Forecasting Book Sales (Continued)

a. Now apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.

fc.pb.holt <- holt(pb, h=4)
autoplot(fc.pb.holt) +
  forecast::autolayer(fitted(fc.pb.holt), series="Fitted") +
  ylab("Daily Paperback Sales") + xlab("Day of Month")

fc.hc.holt <- holt(hc, h=4)
autoplot(fc.hc.holt) +
  forecast::autolayer(fitted(fc.hc.holt), series="Fitted") +
  ylab("Daily Hardcover Sales") + xlab("Day of Month")

The predicted values are given by the forward-looking means, listed here:

fc.pb.holt$mean

## Time Series:
## Start = 27 
## End = 30 
## Frequency = 1 
## [1] 199.6435 200.7045 201.7654 202.8263

fc.hc.holt$mean

## Time Series:
## Start = 27 
## End = 30 
## Frequency = 1 
## [1] 237.4084 240.7282 244.0481 247.3679

b. Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.

accuracy(fc.pb.holt,books[,1])

##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -3.875031 31.94914 26.36988 -5.838267 16.06684 0.6482272
## Test set     17.265080 26.95439 24.14777  7.035506 10.69651 0.5936030
##                    ACF1 Theil's U
## Training set -0.1460121        NA
## Test set     -0.4725323 0.7436385

RMSE improves from 33.88176 under simple exponential smoothing to 26.95439 under Holt’s linear method when performing 4-period forcasting for the test set. The training set error also improves from 34.07320 to 31.94914, as measured by RMSE.

accuracy(fc.hc.holt,books[,2])

##                      ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -0.1887632 26.57365 22.29866 -2.205423 12.17368 0.7083436
## Test set     -1.6381643 30.92770 28.59009 -2.404104 12.11819 0.9081984
##                     ACF1 Theil's U
## Training set  0.04855399        NA
## Test set     -0.73601579  0.468241

By capturing the trend in the hardcover data, we see sizeable improvements in both training set (26.57365 vs. 30.44212) and test set (30.92770 vs. 40.61285) error, as measured by RMSE.

c. Compare the forecasts for the two series using both methods. Which do you think is best?

fc.pb.ses$mean

## Time Series:
## Start = 27 
## End = 30 
## Frequency = 1 
## [1] 191.8781 191.8781 191.8781 191.8781

tail(books[,1],4)

## Time Series:
## Start = 27 
## End = 30 
## Frequency = 1 
## [1] 222 217 188 247

ses: 191.8781 191.8781 191.8781 191.8781 holt: 199.6435 200.7045 201.7654 202.8263

Both models underestimate three of the paperbook values, although holt() less so by capturing some of the upward trend (slope is 1.0737).

fc.hc.ses$mean

## Time Series:
## Start = 27 
## End = 30 
## Frequency = 1 
## [1] 215.9062 215.9062 215.9062 215.9062

tail(books[,2],4)

## Time Series:
## Start = 27 
## End = 30 
## Frequency = 1 
## [1] 201 283 220 259

ses: 215.9062 215.9062 215.9062 215.9062 holt: 237.4084 240.7282 244.0481 247.3679

The hardcover data shows wide swings at the end of the time series, which are not predicted by either of the models. ses() does a better job predicting the below-mean values, while holt() does better with the above-mean values.

The holt() model appears to be the better choice from this cursory examination of the last 4 in-sample periods.

d. Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.

Paperback Data

fc.pb.ses[c('upper','lower')]

## $upper
## Time Series:
## Start = 27 
## End = 30 
## Frequency = 1 
##         80%      95%
## 27 237.3277 261.3873
## 28 237.7406 262.0188
## 29 238.1498 262.6446
## 30 238.5555 263.2650
## 
## $lower
## Time Series:
## Start = 27 
## End = 30 
## Frequency = 1 
##         80%      95%
## 27 146.4285 122.3690
## 28 146.0156 121.7375
## 29 145.6064 121.1116
## 30 145.2008 120.4913

fc.pb.holt[c('upper','lower')]

## $upper
## Time Series:
## Start = 27 
## End = 30 
## Frequency = 1 
##         80%      95%
## 27 244.1549 267.7177
## 28 245.2158 268.7787
## 29 246.2767 269.8396
## 30 247.3376 270.9005
## 
## $lower
## Time Series:
## Start = 27 
## End = 30 
## Frequency = 1 
##         80%      95%
## 27 155.1322 131.5693
## 28 156.1931 132.6303
## 29 157.2540 133.6912
## 30 158.3150 134.7521

m.pb <- mean(window(books[,1], end=length(books[,1])-4))
s.pb.ses <- sd(fc.pb.ses$model$residuals)
s.pb.holt <- sd(fc.pb.holt$model$residuals)
print(paste0("The mean of the paperback sales values calculated from the data set is: ", round(m.pb,3)))

## [1] "The mean of the paperback sales values calculated from the data set is: 181.462"

print(paste0("The standard deviation of the residuals calculated from residuals from ses() model is: ", round(s.pb.ses,3)))

## [1] "The standard deviation of the residuals calculated from residuals from ses() model is: 34.251"

print(paste0("The standard deviation of the residuals calculated from residuals from holt() model is: ", round(s.pb.holt,3)))

## [1] "The standard deviation of the residuals calculated from residuals from holt() model is: 32.341"

m.pb - 1.96 * s.pb.ses

## [1] 114.3291

m.pb + 1.96 * s.pb.ses

## [1] 248.594

m.pb - 1.96 * s.pb.holt

## [1] 118.0726

m.pb + 1.96 * s.pb.holt

## [1] 244.8505

Paperback series 95% confidence intervals
ses: [122.3690, 261.3873]
holt: [131.5693, 267.7177]
manual_ses: [114.3291, 248.594]
manual_holt: [118.0726, 244.8505]

Hardcover Data

fc.hc.ses[c('upper','lower')]

## $upper
## Time Series:
## Start = 27 
## End = 30 
## Frequency = 1 
##         80%      95%
## 27 256.5123 278.0079
## 28 259.3115 282.2888
## 29 261.9407 286.3099
## 30 264.4277 290.1134
## 
## $lower
## Time Series:
## Start = 27 
## End = 30 
## Frequency = 1 
##         80%      95%
## 27 175.3000 153.8044
## 28 172.5009 149.5235
## 29 169.8716 145.5024
## 30 167.3847 141.6989

fc.hc.holt[c('upper','lower')]

## $upper
## Time Series:
## Start = 27 
## End = 30 
## Frequency = 1 
##         80%      95%
## 27 274.4307 294.0290
## 28 277.7505 297.3489
## 29 281.0703 300.6687
## 30 284.3901 303.9885
## 
## $lower
## Time Series:
## Start = 27 
## End = 30 
## Frequency = 1 
##         80%      95%
## 27 200.3862 180.7878
## 28 203.7060 184.1076
## 29 207.0258 187.4275
## 30 210.3457 190.7473

m.hc <- mean(window(books[,2], end=length(books[,1])-4))
s.hc.ses <- sd(fc.hc.ses$model$residuals)
s.hc.holt <- sd(fc.hc.holt$model$residuals)
print(paste0("The mean of the hardcover sales values calculated from the data set is: ", round(m.hc,3)))

## [1] "The mean of the hardcover sales values calculated from the data set is: 192.385"

print(paste0("The standard deviation of the residuals calculated from residuals from ses() model is: ", round(s.hc.ses,3)))

## [1] "The standard deviation of the residuals calculated from residuals from ses() model is: 30.209"

print(paste0("The standard deviation of the residuals calculated from residuals from holt() model is: ", round(s.hc.holt,3)))

## [1] "The standard deviation of the residuals calculated from residuals from holt() model is: 27.099"

m.hc - 1.96 * s.hc.ses

## [1] 133.1751

m.hc + 1.96 * s.hc.ses

## [1] 251.5941

m.hc - 1.96 * s.hc.holt

## [1] 139.2701

m.hc + 1.96 * s.hc.holt

## [1] 245.4991

Hardcover series 95% confidence intervals
ses: [153.8044, 278.0079]
holt: [180.7878, 294.0290]
manual_ses: [133.1751, 251.5941]
manual_holt: [139.2701, 245.4991]

8. Australian Retail Data

Recall your retail time series data (from Exercise 3 in Section 2.10).

retaildata <- readxl::read_excel('retail.xlsx', skip=1)
myts <- ts(retaildata[,"A3349824F"], frequency=12, start=c(1982,4))
#tail(myts) #ends Dec 2013

lambda <- BoxCox.lambda(myts)
dframe <- cbind(RawSeries = myts, TransformedData = BoxCox(myts,lambda))
autoplot(dframe, facet=TRUE) +
  xlab("Years") + ylab("Sales") +
  ggtitle("Australian Retail Data (ID: A3349824F), \n power parameter lambda = ", round(lambda,5))

a. Why is multiplicative seasonality necessary for this series?

When we apply the ets() function, we observe it selects the ETS(M,Ad,M) model as the best fit. That indicated multiplicative error, damped additive trend with multiplicative seasonality.

ets(myts)

## ETS(M,Ad,M) 
## 
## Call:
##  ets(y = myts) 
## 
##   Smoothing parameters:
##     alpha = 0.4886 
##     beta  = 0.0155 
##     gamma = 0.1357 
##     phi   = 0.98 
## 
##   Initial states:
##     l = 54.5757 
##     b = 0.7204 
##     s = 0.9373 0.8911 0.9461 1.4023 1.0547 1.0183
##            0.958 0.9831 0.9632 0.9424 1.0189 0.8847
## 
##   sigma:  0.0496
## 
##      AIC     AICc      BIC 
## 4022.269 4024.159 4093.240

This is further supported by this plot, which shows an increase in volatility when the values are seasonally-differenced by a lag of 12.

autoplot(diff(myts,12)) +
    xlab("Year") + ylab("") +
    ggtitle("Seasonally-differenced Retail Data")

b. Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

myts.train <- window(myts, end=c(2005,8))
myts.test <- window(myts, start=c(2005,9))
#length(myts.test)
fc.myts.hw <- hw(myts.train, seasonal="multiplicative", h=100)
fc.myts.damped <- hw(myts.train, seasonal="multiplicative", damped=TRUE, h=100)
autoplot(myts, series='Raw data') +
  autolayer(fc.myts.hw, series="HW multiplicative seasonality", PI=FALSE) + 
  autolayer(fc.myts.damped, series="HW dampened trend", PI=FALSE)

fc.myts.damped$model

## Damped Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = myts.train, h = 100, seasonal = "multiplicative", damped = TRUE) 
## 
##   Smoothing parameters:
##     alpha = 0.4766 
##     beta  = 0.0169 
##     gamma = 0.1427 
##     phi   = 0.98 
## 
##   Initial states:
##     l = 54.2878 
##     b = 0.5522 
##     s = 0.9401 0.8981 1.0211 1.3581 1.0453 1.0216
##            0.9537 0.9578 0.9487 0.9822 0.9888 0.8844
## 
##   sigma:  0.0564
## 
##      AIC     AICc      BIC 
## 2803.867 2806.478 2869.358

When we use a dampened model, the dampening parameter is determined to be phi = 0.98.

c. Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

print("Holt-Winters Model Errors with Multiplicative Seasonality")

## [1] "Holt-Winters Model Errors with Multiplicative Seasonality"

accuracy(fc.myts.hw, myts)

##                      ME      RMSE      MAE        MPE     MAPE      MASE
## Training set   0.812558  9.403028  6.89913  0.3976988 4.191641 0.4147558
## Test set     -26.078074 43.684165 36.75501 -6.4274086 8.886152 2.2096050
##                     ACF1 Theil's U
## Training set -0.08687436        NA
## Test set      0.89762015 0.9498884

print("")

## [1] ""

print("Holt-Winters Model Errors with Multiplicative Seasonality and Dampened Trend")

## [1] "Holt-Winters Model Errors with Multiplicative Seasonality and Dampened Trend"

accuracy(fc.myts.damped, myts)

##                     ME      RMSE       MAE        MPE     MAPE      MASE
## Training set 0.9000269  9.425183  6.947893  0.3603983 4.217404 0.4176872
## Test set     2.5815068 35.540305 30.946366 -0.1863315 7.152830 1.8604059
##                     ACF1 Theil's U
## Training set -0.09315505        NA
## Test set      0.89976891 0.7207306

The training set RMSE is slightly less for the Holt Winters model with multiplicative seasonality (9.403028 vs. 9.425183), while the model with dampened trend holds up better when looking at the test set. There is a large jump in error under both models, suggesting weak fitting out of sample.

d. Check that the residuals from the best method look like white noise.

checkresiduals(fc.myts.hw)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' multiplicative method
## Q* = 37.559, df = 8, p-value = 9.08e-06
## 
## Model df: 16.   Total lags used: 24

checkresiduals(fc.myts.damped)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt-Winters' multiplicative method
## Q* = 40.104, df = 7, p-value = 1.202e-06
## 
## Model df: 17.   Total lags used: 24

The low p-values and high Q* for the Ljung-Box test leads us to reject the hypothesis that the data are not significantly different from white noise.

e. Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 8 in Section 3.7?

myts2.train <- window(myts, end=c(2010,12))
#length(myts2.train)
myts2.test <- window(myts, start=c(2011,1))
fc.myts.snaive <- snaive(myts2.train, h=length(myts2.test))
accuracy(fc.myts.snaive, myts.test)

##                    ME     RMSE      MAE       MPE      MAPE     MASE
## Training set 12.11051 25.24173 18.71471  6.299482  8.728705 1.000000
## Test set     83.08611 90.62467 83.08611 17.030562 17.030562 4.439614
##                   ACF1 Theil's U
## Training set 0.7639057        NA
## Test set     0.7944137   2.03766

fc.myts.hw2 <- hw(myts2.train, seasonal="multiplicative", h=length(myts2.test))
accuracy(fc.myts.hw2, myts2.test)

##                      ME     RMSE       MAE        MPE      MAPE      MASE
## Training set  0.3443698 10.02954  7.474705 -0.2325084  4.031062 0.3994026
## Test set     54.5217782 61.42603 54.799584 11.0025099 11.073560 2.9281549
##                    ACF1 Theil's U
## Training set 0.03814269        NA
## Test set     0.80369891  1.357665

autoplot(myts) +
  autolayer(fc.myts.snaive, series="Seasonal Naive", PI=FALSE) +
  autolayer(fc.myts.hw2, series="Holt-Winters with Multiplicative Seasonality", PI=FALSE)

The Holt-Winters model beats the Seasonally-naive one in terms of RMSE of the training set (10.02954 vs 25.24173), as well as the test set (61.42603 vs. 90.62467). However there is a large jump in the RMSE for the test set for both, suggesting poor fit of the models.

9. Australian Retail Data (Continued)

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

lambda <- BoxCox.lambda(myts2.train)
myts.bc <- BoxCox(myts2.train,lambda)
#length(myts.bc)

Applying the ets() function, as expected, we now have no seasonality (thanks to seasonal adjustment), and the error and trends are both additive (thanks to the box-cox transformation).

myts.bc %>% mstl() -> fit #automatic STL model
(fc.myts.bc.ssn <- ets(seasadj(fit)))

## ETS(A,A,N) 
## 
## Call:
##  ets(y = seasadj(fit)) 
## 
##   Smoothing parameters:
##     alpha = 0.511 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 6.0593 
##     b = 0.015 
## 
##   sigma:  0.1146
## 
##      AIC     AICc      BIC 
## 526.9949 527.1719 546.2127

To evaluate the forecasts, we have to back-transform this data, reversing the Box-Cox transformation, before subjecting it to the Holt model.

myts.backtrack <- (lambda * fc.myts.bc.ssn$x + 1) ^ (1/lambda)
fc.myts.bt.holt <- holt(myts.backtrack, h=length(myts2.test))
accuracy(fc.myts.bt.holt, myts2.test)

##                        ME      RMSE      MAE        MPE      MAPE
## Training set  0.002605517  8.973722  6.59631 -0.3164603  3.362551
## Test set     53.437754918 79.672319 60.12208  9.8023328 11.569292
##                   MASE        ACF1 Theil's U
## Training set 0.3510053 -0.03134838        NA
## Test set     3.1992386  0.55913689  1.683323

The training set RMSE for the Box-Cox transformed, seasonally-adjusted model is 8.973722, better than the previous two models. However, the test set error (79.672319) is higher than the Holt-Winter Seasonally adjusted model (although better than the seasonally naive one)