DATA624 HW5

Exercise 7.1: Consider the pigs series — the number of pigs slaughtered in Victoria each month.

Use the ses() function in R to find the optimal values of \(\alpha\) and ℓ0, and generate forecasts for the next four months.

pigsdata <- pigs
fc <- ses(pigsdata, h = 4)
summary(fc)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = pigsdata, h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.2971 
## 
##   Initial states:
##     l = 77260.0561 
## 
##   sigma:  10308.58
## 
##      AIC     AICc      BIC 
## 4462.955 4463.086 4472.665 
## 
## Error measures:
##                    ME    RMSE      MAE       MPE     MAPE      MASE       ACF1
## Training set 385.8721 10253.6 7961.383 -0.922652 9.274016 0.7966249 0.01282239
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Sep 1995       98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995       98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995       98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995       98816.41 83958.37 113674.4 76092.99 121539.8

autoplot(fc) +
  autolayer(fitted(fc), series = 'Fitted') +
  ylab("Number of Pigs Slaughtered in Victoria") +
  xlab('Year')

The coefficients calculated by the ses function are: \(\alpha = 0.2971\) and \(l = 77260.0561\)

Compute a 95% prediction interval for the first forecast using ^y±1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

The \(\sigma\) calculated above is 10308.58 and the final datapoint is our \(\hat{y} = 98816.41\), making our calculated 95% prediction interval \[\hat{y}± 1.95\sigma = 98816.41 ± 1.96 *10308.58 = \]

\[78611.59 - 119021.23\] which compares extremely favorably with the \(78611.97 - 119020.8\) provided by R.

Exercise 7.5: Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.

Plot the series and discuss the main features of the data.

bookdata<- books
plot(bookdata)

This dataset contains two timeseries: one for the sale of Paperback books and another for the sale of Hardcovers. Both sets contain data for 30 units of time (days). There appears to be an upward trend in both timeseries, with the Paperback one showing more daily fluctuations than the Hardcover data. The Paperback data shows drastic daily swings that are fairly consistent. The fluctuations appear to decrease over time, but there is too little data to be definitive on the subject.

Use the ses() function to forecast each series, and plot the forecasts.

fc2 <- ses(bookdata[,1], h = 4)
summary(fc2)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = bookdata[, 1], h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.1685 
## 
##   Initial states:
##     l = 170.8271 
## 
##   sigma:  34.8183
## 
##      AIC     AICc      BIC 
## 318.9747 319.8978 323.1783 
## 
## Error measures:
##                    ME     RMSE     MAE       MPE     MAPE      MASE       ACF1
## Training set 7.175981 33.63769 27.8431 0.4736071 15.57784 0.7021303 -0.2117522
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       207.1097 162.4882 251.7311 138.8670 275.3523
## 32       207.1097 161.8589 252.3604 137.9046 276.3147
## 33       207.1097 161.2382 252.9811 136.9554 277.2639
## 34       207.1097 160.6259 253.5935 136.0188 278.2005

autoplot(fc2) +
  autolayer(fitted(fc2), series = 'Fitted') +
  ylab("Number of Paperback Books Sold") +
  xlab('Day')

fc3 <- ses(bookdata[,2], h = 4)
summary(fc3)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = bookdata[, 2], h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.3283 
## 
##   Initial states:
##     l = 149.2861 
## 
##   sigma:  33.0517
## 
##      AIC     AICc      BIC 
## 315.8506 316.7737 320.0542 
## 
## Error measures:
##                    ME     RMSE      MAE      MPE     MAPE      MASE       ACF1
## Training set 9.166735 31.93101 26.77319 2.636189 13.39487 0.7987887 -0.1417763
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       239.5601 197.2026 281.9176 174.7799 304.3403
## 32       239.5601 194.9788 284.1414 171.3788 307.7414
## 33       239.5601 192.8607 286.2595 168.1396 310.9806
## 34       239.5601 190.8347 288.2855 165.0410 314.0792

autoplot(fc3) +
  autolayer(fitted(fc3), series = 'Fitted') +
  ylab("Number of Hardcover Books Sold") +
  xlab('Day')

Compute the RMSE values for the training data in each case.

As provided within the summaries above, the RMSE value for the Paperback series is 33.63769 and 31.93101 for the Hardcover timeseries.

Exercise 7.6: We will continue with the daily sales of paperback and hardcover books in data set books.

Apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.

holt1 <- holt(bookdata[,1], h=4)
holt2 <- holt(bookdata[,2], h=4)

summary(holt1)

## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = bookdata[, 1], h = 4) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 170.699 
##     b = 1.2621 
## 
##   sigma:  33.4464
## 
##      AIC     AICc      BIC 
## 318.3396 320.8396 325.3456 
## 
## Error measures:
##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -3.717178 31.13692 26.18083 -5.508526 15.58354 0.6602122
##                    ACF1
## Training set -0.1750792
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       209.4668 166.6035 252.3301 143.9130 275.0205
## 32       210.7177 167.8544 253.5811 145.1640 276.2715
## 33       211.9687 169.1054 254.8320 146.4149 277.5225
## 34       213.2197 170.3564 256.0830 147.6659 278.7735

summary(holt2)

## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = bookdata[, 2], h = 4) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 147.7935 
##     b = 3.303 
## 
##   sigma:  29.2106
## 
##      AIC     AICc      BIC 
## 310.2148 312.7148 317.2208 
## 
## Error measures:
##                      ME     RMSE      MAE       MPE    MAPE      MASE
## Training set -0.1357882 27.19358 23.15557 -2.114792 12.1626 0.6908555
##                     ACF1
## Training set -0.03245186
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       250.1739 212.7390 287.6087 192.9222 307.4256
## 32       253.4765 216.0416 290.9113 196.2248 310.7282
## 33       256.7791 219.3442 294.2140 199.5274 314.0308
## 34       260.0817 222.6468 297.5166 202.8300 317.3334

autoplot(bookdata) + 
  autolayer(holt1, series = 'Holt Paperback forecast', PI=FALSE) +
  autolayer(holt2, series = 'Holt Hardcover forecast', PI=FALSE)

Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.

e_pap_ses <- tsCV(books[,1], ses, h=4)
e_hard_ses <- tsCV(books[,2], ses, h=4)
e_pap_holt <- tsCV(books[,1], holt, h=4)
e_hard_holt <- tsCV(books[,2], holt, h=4)

#Calculate MSE
ses_pap_rmse <- sqrt(mean(e_pap_ses^2, na.rm = TRUE))
holt_pap_rmse <- sqrt(mean(e_pap_holt^2, na.rm = TRUE))
ses_hard_rmse <- sqrt(mean(e_hard_ses^2, na.rm = TRUE))
holt_hard_rmse <- sqrt(mean(e_hard_holt^2, na.rm = TRUE))

The RMSE for the Paperback ses is 38.4091783 and 46.1478633 for the holt, indicating the ses is a better forecast. Holt assumes a clear trend which seem to be present it our data, but the magnitude of the “noise” which doesn’t appear to be cyclical is so high that it appears to reduce the accuracy of the Holt method. The degree to which te swings affect the trend almost make the data seem random, increasing the perceived accuracy of the ses method.

For the Harcover data, the corresponding numbers are 38.740656 and 39.428139. The numbers here are much closer, though the ses RMSE is still slightly lower. The trend in this data is more obviously positive, explaining the lower RMSE for the Holt forecast.

Compare the forecasts for the two series using both methods. Which do you think is best?

For the Paperback dataset I would likely stick to the ses method since the trend is not as clear as one would like in such a small set of data.

For the Hardcover data I would probably go with the Holt forecast since it accounts for the upward trend within the data and reflects those changes more strongly.

d.Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.

ses_pap_sig <- 34.8183
l1_predict_ses_pap <- 239.5601

min_ses_pap <- l1_predict_ses_pap - 1.96 * ses_pap_sig
max_ses_pap <- l1_predict_ses_pap + 1.96 * ses_pap_sig

holt_pap_sig <- 33.4464
l1_predict_holt_pap <- 209.4668

min_holt_pap <- l1_predict_holt_pap - 1.96 * holt_pap_sig
max_holt_pap <- l1_predict_holt_pap + 1.96 * holt_pap_sig

ses_hard_sig <- 33.0517
l1_predict_ses_hard <- 239.5601

min_ses_hard <- l1_predict_ses_hard  - 1.96 * ses_hard_sig
max_ses_hard  <- l1_predict_ses_hard  + 1.96 * ses_hard_sig

holt_hard_sig <- 29.2106
l1_predict_holt_hard <- 250.1739

min_holt_hard <- l1_predict_holt_hard  - 1.96 * holt_hard_sig
max_holt_hard  <- l1_predict_holt_hard  + 1.96 * holt_hard_sig

The four calculated 95% prediction intervals are:

SES for Paperback:

\(171.316232 - 307.803968\)

Holt for Paperback:

\(143.911856 - 275.021744\)

SES for Hardcover:

\(174.778768 - 304.341432\)

Holt for Hardcover:

\(192.921124 - 307.426676\)

We can clearly see that while the Holt predictions had higher RMSE values, the prediction intervals are somewhat narrower for both sets than that of the ses prediction intervals.

For Paperbacks, the holt prediction interval is lower than that of the ses prediction, shifted by about 30 sales. For Hardcovers, the average of the prediction is shifted up fo theholtby about 10 sales, with both the lower and higher bounds of the interval higher than the lower and higher bounds of theses` interval respectively.

Exercise 7.7: For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900–1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.

[Hint: use h=100 when calling holt() so you can clearly see the differences between the various options when plotting the forecasts.]

Which model gives the best RMSE?

fc7_1 <- holt(eggs, h=100)
fc7_2 <- holt(eggs, damped = TRUE, phi = 0.98, h=100)
fc7_3 <- holt(eggs, damped = TRUE, phi = 0.975, h=100)
fc7_4 <- holt(eggs, damped = TRUE, phi = 0.97, h=100)
fc7_5 <- holt(eggs, damped = TRUE, phi = 0.95, h=100)

autoplot (eggs) +
  autolayer(fc7_1, series = "Holt's Undamped", PI = FALSE) +
  autolayer(fc7_2, series = "Damped Holt's phi = 0.98", PI = FALSE) +
  autolayer(fc7_3, series = "Damped Holt's phi = 0.975", PI = FALSE) +
  autolayer(fc7_4, series = "Damped Holt's phi = 0.97", PI = FALSE) +
  autolayer(fc7_5, series = "Damped Holt's phi = 0.95", PI = FALSE) +
  ggtitle("Effect of phi on Holt's Method Forecasts") +
  xlab('Year')+
  ylab('Price of Dozen Eggs in US ($)')+
  guides(colour=guide_legend(title = 'Forecast'))

We can see here that applying a dampening factor \(\phi\) to the predictive model changes our projections from following the overall downward trend (impossible as it predicts the price of eggs to be negative at some point around 2015) as the undamped model does. This seems to be a great example where dampening the model is critical for the model to reflect reality. Adjusting the coefficient \(\phi\) influences how quickly the dampening effect takes place, with the highest value of 0.98 resulting in the slowest and most gradual dampening.

fc7_6 <- holt(eggs, damped = TRUE, phi = 0.98, alpha = 0.3, h=100)
fc7_7 <- holt(eggs, damped = TRUE, phi = 0.98, alpha = 0.4, h=100)
fc7_8 <- holt(eggs, damped = TRUE, phi = 0.98, alpha = 0.6, h=100)
fc7_9 <- holt(eggs, damped = TRUE, phi = 0.98, alpha = 0.8, h=100)

autoplot (eggs) +
  autolayer(fc7_2, series = "Damped Holt's alpha = 0.2", PI = FALSE) +
  autolayer(fc7_6, series = "Damped Holt's alpha = 0.3", PI = FALSE) +
  autolayer(fc7_7, series = "Damped Holt's alpha = 0.4", PI = FALSE) +
  autolayer(fc7_8, series = "Damped Holt's alpha = 0.6", PI = FALSE) +
  autolayer(fc7_9, series = "Damped Holt's alpha = 0.7", PI = FALSE) +
  ggtitle("Effect of alpha on Holt's Method Forecasts (phi = 0.98)") +
  xlab('Year')+
  ylab('Price of Dozen Eggs in US ($)')+
  guides(colour=guide_legend(title = 'Forecast'))+
  ylim(25,80)

Adjusting the \(\alpha\) coefficient impacts how much weight is placed on the closest datapoints for calculating the prediction. Reducing this coefficient will slightly increase the weight of relatively “older” datapoints. We can see from the graph above that the starting point of our prediction largely depends on the alpha coefficient, with the lower coefficients corresponding to higher values for the initial prediction - this holds true due to the overall trend of the data being downward, meaning older datapoints are more likely to be higher values. This is not, however, a linear shift of the entire projection - we can clearly see that the prediction with the highest \(\alpha = 0.7\) starts the at the lowest value, but the rest of the prediction ends higher than many of the others.

Exercise 7.8: Recall your retail time series data (from Exercise 3 in Section 2.10).

#Data available at https://github.com/mkollontai/DATA624
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349874C"],
  frequency=12, start=c(1982,4))
autoplot(myts)

Why is multiplicative seasonality necessary for this series?

In the graph above it is evident that the seasonal component of the data increases as time moves forward. This can be broken out with the additive method, but it would result in changing magnitudes of the seasonal component. By applying the multiplicative method we can ensure that the seasonal component of our decomposition has a magnitude that is closer to uniform - something easier to work with in future manipulations/analyses.

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

fc8_1 <- hw(myts, seasonal='multiplicative', h=50)
fc8_2 <- hw(myts, seasonal='multiplicative', damped = TRUE, h=50)

autoplot (myts) +
  autolayer(fc8_1, series = "Holt-Winters Undamped") +
  autolayer(fc8_2, series = "Damped Holt-Winters") +
  ggtitle("Holt-Winters Method Forecasts of Retail Data") +
  xlab('Year')+
  ylab('Retail Data')+
  guides(colour=guide_legend(title = 'Forecast'))

In this instance, the dampening appears to flatten out the trend of the prediction nearly completely.

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

accuracy(fc8_1)

##                      ME     RMSE      MAE        MPE     MAPE     MASE
## Training set -0.2547906 8.354074 5.050794 -0.5269882 4.330325 0.533214
##                   ACF1
## Training set 0.1163112

accuracy(fc8_2)

##                     ME     RMSE      MAE       MPE     MAPE      MASE      ACF1
## Training set 0.6737245 8.391557 5.007798 0.4181949 4.182986 0.5286749 0.1004214

The undamped prediction appears to have a slightly lower RMSE. It’s difficult to select one as better, but to me the damped version seems better as the data does appear to be leveling off as time progresses.

Check that the residuals from the best method look like white noise.

autoplot(residuals(fc8_2))

The residuals of the damped method do indeed look like white noise, though there are a few noticeable spikes around 2000 and 2010.

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 8 in Section 3.7?

myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)

fc_hwd <- hw(myts.train, seasonal='multiplicative', damped = TRUE)
fc_hw <- hw(myts.train, seasonal='multiplicative')
fc_n <- snaive(myts.train)

accuracy(fc_n,myts.test)

##                      ME     RMSE       MAE        MPE      MAPE     MASE
## Training set   6.122823 12.45603  8.666967   5.906047  7.848706 1.000000
## Test set     -31.379167 34.36303 31.379167 -18.882634 18.882634 3.620548
##                   ACF1 Theil's U
## Training set 0.5367142        NA
## Test set     0.1146989 0.8221791

accuracy(fc_hw,myts.test)

##                       ME      RMSE      MAE         MPE      MAPE      MASE
## Training set  -0.1138424  7.899346  4.50940  -0.5483582  4.082228 0.5202973
## Test set     -33.3031918 35.182170 33.30319 -20.1724710 20.172471 3.8425428
##                   ACF1 Theil's U
## Training set 0.1476838        NA
## Test set     0.3629014 0.8667639

accuracy(fc_hwd,myts.test)

##                       ME      RMSE       MAE         MPE      MAPE      MASE
## Training set   0.5156628  7.575832  4.604223   0.3071277  4.175587 0.5312381
## Test set     -32.3466216 34.439038 33.113598 -20.0581682 20.295549 3.8206674
##                   ACF1 Theil's U
## Training set 0.0879495        NA
## Test set     0.1427832  0.855879

The RMSE of the test set for the naive forecast is ever so slightly lower than that of both of the Holt-Winters methods and the damped method beats out the non-damped method (in terms of RMSE).

Exercise 7.9: For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

autoplot(myts)

myts_la <- BoxCox.lambda(myts)
stl <- stlf(myts.train, lambda = myts_la)
ets <- ets(seasadj(decompose(myts.train, 'multiplicative')))

autoplot(myts.train) +
  autolayer(forecast(stl, h = 24), series = 'STL') +
  autolayer(forecast(ets, h = 24, PI = FALSE), series = 'ETS') +
  autolayer(myts.test, series = 'True Data')

accuracy(stl,myts.test)

##                       ME      RMSE       MAE         MPE     MAPE      MASE
## Training set  -0.0609065  6.907577  3.940829  -0.1113073  3.42420 0.4546953
## Test set     -43.5154392 45.809387 43.515439 -26.3661826 26.36618 5.0208382
##                    ACF1 Theil's U
## Training set 0.02776491        NA
## Test set     0.46472471  1.134057

ets$mse

## [1] 47.25964

We can see that the RMSE for the STL method is slightly lower than that of the ETS method, but the Holt-Winters Damped method had the lowest RMSE of them all (other than the naive).

DATA624 HW5

M Kollontai

2/26/2021