DATA 624 - HW5

Do exercises 7.1, 7.5,7.6, 7.7, 7.8 and 7.9 in Hyndman. Please submit both the link to your Rpubs and the .rmd file.

7.1a

Consider the pigs series — the number of pigs slaughtered in Victoria each month.

Use the ses() function in R to find the optimal values of
α and ℓ0, and generate forecasts for the next four months.

Smoothing parameters: alpha = 0.9358

Initial states: l = 348.1644

library(fpp2)
library(forecast)
str(pigs)

##  Time-Series [1:188] from 1980 to 1996: 76378 71947 33873 96428 105084 ...

head(pigs)

##         Jan    Feb    Mar    Apr    May    Jun
## 1980  76378  71947  33873  96428 105084  95741

pigsdata <- window(oil, start=1990)
autoplot(pigsdata) + ylab("Oil of Pigs") + xlab("Year")

fc <- ses(pigsdata, h=4)

summary(fc)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = pigsdata, h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.9358 
## 
##   Initial states:
##     l = 348.1644 
## 
##   sigma:  31.7683
## 
##      AIC     AICc      BIC 
## 246.1915 247.3915 249.7257 
## 
## Error measures:
##                    ME     RMSE      MAE      MPE     MAPE      MASE
## Training set 8.661726 30.41583 22.39636 1.727791 4.793548 0.9704074
##                     ACF1
## Training set -0.08750726
## 
## Forecasts:
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2014       542.7089 501.9962 583.4216 480.4442 604.9736
## 2015       542.7089 486.9488 598.4690 457.4311 627.9867
## 2016       542.7089 475.1748 610.2430 439.4244 645.9935
## 2017       542.7089 465.1684 620.2494 424.1210 661.2968

print(round(accuracy(fc),2))

##                ME  RMSE  MAE  MPE MAPE MASE  ACF1
## Training set 8.66 30.42 22.4 1.73 4.79 0.97 -0.09

#               ME  RMSE  MAE  MPE MAPE MASE  ACF1
#Training set 8.66 30.42 22.4 1.73 4.79 0.97 -0.09

autoplot(fc) + autolayer(fitted(fc), series="Fitted") + 
  ylab("Number of pigs") + xlab("Year")

7.1b

Compute a 95% prediction interval for the first forecast using y(hat) ± 1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.
The calculated 95% intervals are more narrow [484.3332, 601.0846] than SES interval produced by R.

std_dev <- sd(fc$residuals)
std_dev

## [1] 29.78352

p.low <- fc$mean - 1.96 * std_dev
p.high <- fc$mean + 1.96 * std_dev

p.low

## Time Series:
## Start = 2014 
## End = 2017 
## Frequency = 1 
## [1] 484.3332 484.3332 484.3332 484.3332

p.high

## Time Series:
## Start = 2014 
## End = 2017 
## Frequency = 1 
## [1] 601.0846 601.0846 601.0846 601.0846

fc

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2014       542.7089 501.9962 583.4216 480.4442 604.9736
## 2015       542.7089 486.9488 598.4690 457.4311 627.9867
## 2016       542.7089 475.1748 610.2430 439.4244 645.9935
## 2017       542.7089 465.1684 620.2494 424.1210 661.2968

7.5a

Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.

Plot the series and discuss the main features of the data. Paperback and Hardcover books sales shows an increasing trend but no apparent evidence of seasonality cyclic factors.

data(books)
str(books)

##  Time-Series [1:30, 1:2] from 1 to 30: 199 172 111 209 161 119 195 195 131 183 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:2] "Paperback" "Hardcover"

#head(books)
autoplot(books)

7.5b

Use the ses() function to forecast each series, and plot the forecasts.

# SES

# Paperback
fc_paperback_ses <- ses((books[,1]), h=4)
summary(fc_paperback_ses)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = (books[, 1]), h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.1685 
## 
##   Initial states:
##     l = 170.8271 
## 
##   sigma:  34.8183
## 
##      AIC     AICc      BIC 
## 318.9747 319.8978 323.1783 
## 
## Error measures:
##                    ME     RMSE     MAE       MPE     MAPE      MASE
## Training set 7.175981 33.63769 27.8431 0.4736071 15.57784 0.7021303
##                    ACF1
## Training set -0.2117522
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       207.1097 162.4882 251.7311 138.8670 275.3523
## 32       207.1097 161.8589 252.3604 137.9046 276.3147
## 33       207.1097 161.2382 252.9811 136.9554 277.2639
## 34       207.1097 160.6259 253.5935 136.0188 278.2005

autoplot(fc_paperback_ses) + autolayer(fitted(fc_paperback_ses), series="Fitted") + 
  ylab("Daily Sales") + xlab("Year") + ggtitle('Books - Paperback -  SES Forcast')

# Hardcover
fc_hardcover_ses <- ses((books[,2]), h=4)
summary(fc_hardcover_ses)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = (books[, 2]), h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.3283 
## 
##   Initial states:
##     l = 149.2861 
## 
##   sigma:  33.0517
## 
##      AIC     AICc      BIC 
## 315.8506 316.7737 320.0542 
## 
## Error measures:
##                    ME     RMSE      MAE      MPE     MAPE      MASE
## Training set 9.166735 31.93101 26.77319 2.636189 13.39487 0.7987887
##                    ACF1
## Training set -0.1417763
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       239.5601 197.2026 281.9176 174.7799 304.3403
## 32       239.5601 194.9788 284.1414 171.3788 307.7414
## 33       239.5601 192.8607 286.2595 168.1396 310.9806
## 34       239.5601 190.8347 288.2855 165.0410 314.0792

autoplot(fc_hardcover_ses) + autolayer(fitted(fc_hardcover_ses), series="Fitted") + 
  ylab("Daily Sales") + xlab("Days") + ggtitle('Books - Hardcover -  SES Forcast')

7.5c

Compute the RMSE values for the training data in each case.

cat('Paperback SES\n')

## Paperback SES

print(round(accuracy(fc_paperback_ses),2))

##                ME  RMSE   MAE  MPE  MAPE MASE  ACF1
## Training set 7.18 33.64 27.84 0.47 15.58  0.7 -0.21

cat('\n\nHardcover SES\n')

## 
## 
## Hardcover SES

print(round(accuracy(fc_hardcover_ses),2))

##                ME  RMSE   MAE  MPE  MAPE MASE  ACF1
## Training set 9.17 31.93 26.77 2.64 13.39  0.8 -0.14

7.6a

We will continue with the daily sales of paperback and hardcover books in data set books.

Apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.

# Holt's linear method
# Paperback
fc_paperback_holts <- holt((books[,1]), h=4)
summary(fc_paperback_holts)

## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = (books[, 1]), h = 4) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 170.699 
##     b = 1.2621 
## 
##   sigma:  33.4464
## 
##      AIC     AICc      BIC 
## 318.3396 320.8396 325.3456 
## 
## Error measures:
##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -3.717178 31.13692 26.18083 -5.508526 15.58354 0.6602122
##                    ACF1
## Training set -0.1750792
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       209.4668 166.6035 252.3301 143.9130 275.0205
## 32       210.7177 167.8544 253.5811 145.1640 276.2715
## 33       211.9687 169.1054 254.8320 146.4149 277.5225
## 34       213.2197 170.3564 256.0830 147.6659 278.7735

autoplot(fc_paperback_holts) + autolayer(fitted(fc_paperback_holts), series="Fitted") + 
  ylab("Daily Sales") + xlab("Year") + ggtitle('Books - Paperback -  Holt\'s Forcast')

# Hardcover
fc_hardcover_holts <- holt((books[,2]), h=4)
summary(fc_hardcover_holts)

## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = (books[, 2]), h = 4) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 147.7935 
##     b = 3.303 
## 
##   sigma:  29.2106
## 
##      AIC     AICc      BIC 
## 310.2148 312.7148 317.2208 
## 
## Error measures:
##                      ME     RMSE      MAE       MPE    MAPE      MASE
## Training set -0.1357882 27.19358 23.15557 -2.114792 12.1626 0.6908555
##                     ACF1
## Training set -0.03245186
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       250.1739 212.7390 287.6087 192.9222 307.4256
## 32       253.4765 216.0416 290.9113 196.2248 310.7282
## 33       256.7791 219.3442 294.2140 199.5274 314.0308
## 34       260.0817 222.6468 297.5166 202.8300 317.3334

autoplot(fc_hardcover_holts) + autolayer(fitted(fc_hardcover_holts), series="Fitted") + 
  ylab("Daily Sales") + xlab("Days") + ggtitle('Books - Hardcover -  Holt\'s Forcast')

7.6b

Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.

cat('Paperback SES\n')

## Paperback SES

print(round(accuracy(fc_paperback_ses),2))

##                ME  RMSE   MAE  MPE  MAPE MASE  ACF1
## Training set 7.18 33.64 27.84 0.47 15.58  0.7 -0.21

cat('\nPaperback Holts\n')

## 
## Paperback Holts

print(round(accuracy(fc_paperback_holts),2))

##                 ME  RMSE   MAE   MPE  MAPE MASE  ACF1
## Training set -3.72 31.14 26.18 -5.51 15.58 0.66 -0.18

cat('\n\nHardcover SES\n')

## 
## 
## Hardcover SES

print(round(accuracy(fc_hardcover_ses),2))

##                ME  RMSE   MAE  MPE  MAPE MASE  ACF1
## Training set 9.17 31.93 26.77 2.64 13.39  0.8 -0.14

cat('\n\nHardcover Holts\n')

## 
## 
## Hardcover Holts

print(round(accuracy(fc_hardcover_holts),2))

##                 ME  RMSE   MAE   MPE  MAPE MASE  ACF1
## Training set -0.14 27.19 23.16 -2.11 12.16 0.69 -0.03

7.6c

Compare the forecasts for the two series using both methods. Which do you think is best?

The Holt’s time series has a better accuracy that SES for both RMSE AND MAE.

# The Holt's time series has a better accuracy that SES for both RMSE AND MAE

7.6d

Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.

ses_paperback.low <- fc_paperback_ses$mean[1] - 1.96 * sd(fc_paperback_ses$residuals)
ses_paperback.high <- fc_paperback_ses$mean[1] + 1.96 * sd(fc_paperback_ses$residuals)

ses_hardcover.low <- fc_hardcover_ses$mean[1] - 1.96 * sd(fc_hardcover_ses$residuals)
ses_hardcover.high <- fc_hardcover_ses$mean[1] + 1.96 * sd(fc_hardcover_ses$residuals)

holts_paperback.low <- fc_paperback_holts$mean[1] - 1.96 * sd(fc_paperback_holts$residuals)
holts_paperback.high <- fc_paperback_holts$mean[1] + 1.96 * sd(fc_paperback_holts$residuals)

holts_hardcover.low <- fc_hardcover_holts$mean[1] - 1.96 * sd(fc_hardcover_holts$residuals)
holts_hardcover.high <- fc_hardcover_holts$mean[1] + 1.96 * sd(fc_hardcover_holts$residuals)

ses_paperback <- c(ses_paperback.low, ses_paperback.high)  
ses_hardcover <- c(ses_hardcover.low, ses_hardcover.high) 
holts_paperback <- c(holts_paperback.low, holts_paperback.high)
holts_hardcover <- c(holts_hardcover.low, holts_hardcover.high)

con.int <- data.frame(ses_paperback, holts_paperback, ses_hardcover, holts_hardcover)
con.int

##   ses_paperback holts_paperback ses_hardcover holts_hardcover
## 1      141.5964        147.8390      178.5848        195.9640
## 2      272.6230        271.0945      300.5354        304.3838

7.7

For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900–1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.

[Hint: use h=100 when calling holt() so you can clearly see the differences between the various options when plotting the forecasts.]

Which model gives the best RMSE? Based on the RMSE, Holt with Box Cox gives the best RMSE: 17.48

str(eggs)

##  Time-Series [1:94] from 1900 to 1993: 277 315 315 321 315 ...

autoplot(eggs)

# eggs

eggsdata <- window(eggs,start=1950)

# Without 
fc_eggs_holts <- holt(eggsdata, h=100)
autoplot(fc_eggs_holts) + autolayer(fitted(fc_eggs_holts), series="Fitted") + 
  ylab("Price per Dozen Eggs") + xlab("Years") + ggtitle('Holt\'s Without Damped or BoxCox')

#Accuracy - RMSE
cat('Holt\'s Without Damped or BoxCox\n')

## Holt's Without Damped or BoxCox

print(round(accuracy(fc_eggs_holts),2))

##                 ME  RMSE   MAE   MPE MAPE MASE  ACF1
## Training set -0.57 18.14 13.23 -0.54 9.53 0.81 -0.06

# With Damped trend
fc_eggs_holts <- holt(eggsdata, h=100, damped = TRUE)
autoplot(fc_eggs_holts) + autolayer(fitted(fc_eggs_holts), series="Fitted") + 
  ylab("Price per Dozen Eggs") + xlab("Years") + ggtitle('Eggs -  Holt\'s Forcast With Damped Trend')

#Accuracy - RMSE
cat('Holt\'s With Damped = TRUE \n')

## Holt's With Damped = TRUE

print(round(accuracy(fc_eggs_holts),2))

##                 ME  RMSE  MAE   MPE MAPE MASE  ACF1
## Training set -0.54 17.58 13.3 -1.96 9.99 0.81 -0.02

# With Box Cox
eggs_lambda <- BoxCox.lambda(eggsdata)
print('BoxCox Lambda')

## [1] "BoxCox Lambda"

eggs_lambda

## [1] 0.1515035

fc_eggs_holts <- holt(eggsdata, h=100,lambda = eggs_lambda )
autoplot(fc_eggs_holts) + autolayer(fitted(fc_eggs_holts), series="Fitted") + 
  ylab("Price per Dozen Eggs") + xlab("Years") + ggtitle('Eggs -  Holt\'s Forcast With Box Cox')

#Accuracy - RMSE
cat('Holt\'s With BoxCox\n')

## Holt's With BoxCox

print(round(accuracy(fc_eggs_holts),2))

##                ME  RMSE   MAE   MPE MAPE MASE  ACF1
## Training set 0.84 17.48 12.97 -0.82 9.55 0.79 -0.06

7.8a

Recall your retail time series data (from Exercise 3 in Section 2.10).

Why is multiplicative seasonality necessary for this series?

retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts_retail <- ts(retaildata[,"A3349873A"],frequency=12, start=c(1982,4))
autoplot(myts_retail)

7.8b

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

# With Box Cox without damped

fc_retail_hw <- hw(myts_retail, h=100,seasonal = 'multiplicative')
autoplot(fc_retail_hw) + autolayer(fitted(fc_retail_hw), series="Fitted") + 
  ylab("Sales") + xlab("Years") + ggtitle('Retail - Holt-Winters')

# With Box Cox with damped

fc_retail_hw_damped <- hw(myts_retail, h=100,seasonal = 'multiplicative', damped = TRUE)
autoplot(fc_retail_hw_damped) + autolayer(fitted(fc_retail_hw_damped), series="Fitted") + 
  ylab("Sales") + xlab("Years") + ggtitle('Retail - Holt-Winters with Damped Trend')

7.8c

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer? Interestingly enough, both RMSE is almost the same. The forcast without the damped trend is slightly better with an RMSE of 13.29 as opposed to the HW with damped trend at 13.3

#Accuracy - RMSE
cat('Retail - Holt-Winters without Damped Trend\n')

## Retail - Holt-Winters without Damped Trend

print(round(accuracy(fc_retail_hw),2))

##                ME  RMSE  MAE   MPE MAPE MASE ACF1
## Training set 0.12 13.29 8.99 -0.12 3.92 0.47 0.09

cat('\n\n')

#Accuracy - RMSE
cat('Retail - Holt-Winters With Damped Trend\n')

## Retail - Holt-Winters With Damped Trend

print(round(accuracy(fc_retail_hw_damped),2))

##                ME RMSE  MAE  MPE MAPE MASE ACF1
## Training set 1.41 13.3 9.04 0.61 3.96 0.48 0.04

7.8d

Check that the residuals from the best method look like white noise. Based on the residual plot, the residuals look ike white noice. They are mostly between the guard bands and does not seem to have any seaonality effects.

ggAcf(fc_retail_hw$residuals)

7.8e

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 8 in Section 3.7? The error is for the naive approach is 100, which is far greater than 13.19. I will have to do more digging as this seems incorrect.

myts_retail.train <- window(myts_retail, end = c(2010, 12))
myts_retail.test <- window(myts_retail, start = 2011)

fc_seasonal_naive <- snaive(myts_retail.train, h = 36)
accuracy(fc_seasonal_naive, myts_retail.test)

##                     ME      RMSE      MAE       MPE      MAPE     MASE
## Training set  7.772973  20.24576 15.95676  4.702754  8.109777 1.000000
## Test set     81.744444 100.00869 82.06667 20.549055 20.669738 5.143067
##                   ACF1 Theil's U
## Training set 0.7385090        NA
## Test set     0.6830879   1.67023

autoplot(window(myts_retail, start = 2011)) + autolayer(fc_seasonal_naive, series = "Seasonal Naive")

7.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set? The RMSE error is much higher than the

fc_retail_stl <- stlm(myts_retail.train, lambda = "auto")
fc_retail_stl_forecast <- forecast(fc_retail_stl)
accuracy(fc_retail_stl_forecast, myts_retail.test)

##                     ME      RMSE       MAE        MPE      MAPE      MASE
## Training set -0.646515  8.147517  5.855529 -0.3001347  2.885877 0.3669624
## Test set     56.462144 71.117890 56.462144 15.5079003 15.507900 3.5384474
##                    ACF1 Theil's U
## Training set 0.02389584        NA
## Test set     0.35938690  1.322933

autoplot(window(myts_retail, start = 2011)) + autolayer(fc_retail_stl_forecast, series = "Box-Cox ETS")