Data 624 Homework 5

(7.1)

Consider the pigs series - the number of pigs slaughtered in Victoria each month.

Use the ses() function in R to find the optimal values of ?? and ???0, and generate forecasts for the next four months.

library(fpp2)

## -- Attaching packages ----------------------------------------------------------------------------------------------------------------------------------------------------- fpp2 2.4 --

## v ggplot2   3.1.0     v fma       2.4  
## v forecast  8.12      v expsmooth 2.3

## Warning: package 'forecast' was built under R version 3.5.3

## Warning: package 'fma' was built under R version 3.5.3

## Warning: package 'expsmooth' was built under R version 3.5.3

##

library(gridExtra)

summary(pigs)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   33873   79080   91662   90640  101493  120184

autoplot(pigs)

alpha = 0.2971, sigma: 10308.58

fc <- ses(pigs,h=4)
fc$model

## Simple exponential smoothing 
## 
## Call:
##  ses(y = pigs, h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.2971 
## 
##   Initial states:
##     l = 77260.0561 
## 
##   sigma:  10308.58
## 
##      AIC     AICc      BIC 
## 4462.955 4463.086 4472.665

Forecast for the next 4 months

forecast(fc)

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Sep 1995       98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995       98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995       98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995       98816.41 83958.37 113674.4 76092.99 121539.8

Compute a 95% prediction interval for the first forecast using ^y±1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

sd <- sd(fc$residuals)
mean <- fc$mean[1]
c(mean - (1.96*sd),mean + (1.96*sd))

## [1]  78679.97 118952.84

The 95% prediction interval is [78679.97, 118952.84]. The forecast’s 95% prediction interval is [78611.97, 119020.8].

(7.5)

Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.

Plot the series and discuss the main features of the data.

There’s an overall increasing trend. We don’t see any seasonality in the plot. Hardcover sales tends to increase over time compared to paperback sales.

autoplot(books)

summary(books)

##    Paperback       Hardcover    
##  Min.   :111.0   Min.   :128.0  
##  1st Qu.:167.2   1st Qu.:170.5  
##  Median :189.0   Median :200.5  
##  Mean   :186.4   Mean   :198.8  
##  3rd Qu.:207.2   3rd Qu.:222.0  
##  Max.   :247.0   Max.   :283.0

Use the ses() function to forecast each series, and plot the forecasts.

paperback_sales <- books[,1]
hardcover_sales <- books[,2]

paperback_ses <- ses(paperback_sales, h=4)
hardcover_ses <- ses(hardcover_sales, h=4)

sales forecast for next 4 days.

forecast(paperback_ses)

##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       207.1097 162.4882 251.7311 138.8670 275.3523
## 32       207.1097 161.8589 252.3604 137.9046 276.3147
## 33       207.1097 161.2382 252.9811 136.9554 277.2639
## 34       207.1097 160.6259 253.5935 136.0188 278.2005

Hardcover sales forecast for next 4 days.

forecast(hardcover_ses)

##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       239.5601 197.2026 281.9176 174.7799 304.3403
## 32       239.5601 194.9788 284.1414 171.3788 307.7414
## 33       239.5601 192.8607 286.2595 168.1396 310.9806
## 34       239.5601 190.8347 288.2855 165.0410 314.0792

Paperback

(paperback_plot <- autoplot(paperback_ses) + autolayer(fitted(paperback_ses), series = "Predicted Paperback") + xlab("Time") + ylab("Sales"))

Hardcover

(hardcover_plot <- autoplot(hardcover_ses) + autolayer(fitted(hardcover_ses), series = "Predicted Hardcover") + xlab("Time") + ylab("Sales"))

Compute the RMSE values for the training data in each case.

(rmse_paperback <- accuracy(paperback_ses)[2])

## [1] 33.63769

(rmse_hardocver <- accuracy(hardcover_ses)[2])

## [1] 31.93101

RMSE for hardcover is better than RMSE for paperback.

(7.6)

Now apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.

paperback_holt <- holt(paperback_sales, h=4)
hardcover_holt <- holt(hardcover_sales, h=4)

hardcover_holt

##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       250.1739 212.7390 287.6087 192.9222 307.4256
## 32       253.4765 216.0416 290.9113 196.2248 310.7282
## 33       256.7791 219.3442 294.2140 199.5274 314.0308
## 34       260.0817 222.6468 297.5166 202.8300 317.3334

paperback_holt

##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       209.4668 166.6035 252.3301 143.9130 275.0205
## 32       210.7177 167.8544 253.5811 145.1640 276.2715
## 33       211.9687 169.1054 254.8320 146.4149 277.5225
## 34       213.2197 170.3564 256.0830 147.6659 278.7735

Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets

(rmse_hardcover_holt <- accuracy(hardcover_holt)[2])

## [1] 27.19358

For SES, RMSE for hardcover is 31.93101, which is higher than 27.19358. RMSE for holt is better.

(rmse_paperback_holt <- accuracy(paperback_holt)[2])

## [1] 31.13692

For SES, RMSE for paperback is 33.63769, which is higher than 31.13692. RMSE for holt is better.

Compare the forecasts for the two series using both methods. Which do you think is best?

I think that the Hold method predicts a more upward trend for both paperback and hardcover sales. Also, the RMSE for holt for both sales is also better. I think that the Holt method is better.

autoplot(paperback_sales) + autolayer(paperback_ses, series = 'SES', PI=FALSE) + autolayer(paperback_holt, series = 'Holt', PI=FALSE)

autoplot(hardcover_sales) + autolayer(hardcover_ses, series = 'SES', PI=FALSE) + autolayer(hardcover_holt, series = 'Holt', PI=FALSE)

(7.7)

For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900-1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.

[Hint: use h=100 when calling holt() so you can clearly see the differences between the various options when plotting the forecasts.]

Which model gives the best RMSE?

eggs_holt <-holt(eggs,h=100)
eggs_holt_dampened <-holt(eggs,damped = TRUE,h=100)
eggs_holt_boxCox <-holt(eggs, lambda=BoxCox.lambda(eggs),h=100)
eggs_holt_boxCox_dampened <-holt(eggs, lambda=BoxCox.lambda(eggs),damped=TRUE,h=100)

p1 <- autoplot(eggs) + autolayer(eggs_holt, series="Holt")
p2 <- autoplot(eggs) + autolayer(eggs_holt_dampened, series="Holt Dampened")
p3 <- autoplot(eggs) + autolayer(eggs_holt_boxCox, series="Holt BoxCox")
p4 <- autoplot(eggs) + autolayer(eggs_holt_boxCox_dampened, series="Holt BoxCox Dampened")

grid.arrange(p1, p2, p3,p4, nrow = 2)

Holt Method - this model was predicting negative forecast.

Holt with Damped - this model resolved negative forecast, but the trend we can see is clearly a negative trend.

Holt with BOxCox - this method was combined with BoxCox. We see a general downward trend, but not as drastic as the first Holt model.

Holt with BoxCox and Damped - this method was combind with BoxCox and damped, and somewhat more similar to Holt with BoxCox.

In terms of RMSE, it appears that Holt BoxCox has the lowest RMSE of 26.39376. So, this appears to be the best model in terms of RMSE. Although the RMSE of all four models are very smilimar.

accuracy(eggs_holt)

##                      ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.04499087 26.58219 19.18491 -1.142201 9.653791 0.9463626
##                    ACF1
## Training set 0.01348202

accuracy(eggs_holt_dampened)

##                     ME     RMSE     MAE       MPE     MAPE      MASE
## Training set -2.891496 26.54019 19.2795 -2.907633 10.01894 0.9510287
##                      ACF1
## Training set -0.003195358

accuracy(eggs_holt_boxCox)

##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.7736844 26.39376 18.96387 -1.072416 9.620095 0.9354593
##                    ACF1
## Training set 0.03887152

accuracy(eggs_holt_boxCox_dampened)

##                      ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -0.8200445 26.53321 19.45654 -2.019718 9.976131 0.9597618
##                     ACF1
## Training set 0.005852382

(7.8) - NOT ANSWERED

Recall your retail time series data (from Exercise 3 in Section 2.10).

retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349335T"],frequency=12, start=c(1982,4))
autoplot(myts)

Why is multiplicative seasonality necessary for this series?

Because based on the plot above, you can see that seasonality variations appear to change with time. So, multiplicative seasonality is used to address seasonal variations that are not constant as additive method can only handle constant seasonal variations.

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

myts_holt_winters <- hw(myts, seasonal = "multiplicative", h=100)
myts_holt_winters_damped  <- hw(myts, damped =TRUE, seasonal = "multiplicative", h=100)


autoplot(myts) + autolayer(myts_holt_winters, series='Holt Winters', PI=FALSE)  +
    autolayer(myts_holt_winters_damped, series='Holt Winters damped', PI=FALSE)

I plotted the next 100 forecasts to see a clear comparison betweeen the two models. In damped method, forecast is increasing but not as much as the other model.

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

myts_oneStep_holt_winters <-  forecast(myts_holt_winters, h=1)
accuracy(myts_oneStep_holt_winters)

##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set 0.9212824 25.20381 18.77683 0.06856226 1.979315 0.3016982
##                    ACF1
## Training set -0.1217931

myts_oneStep_holt_winters_damped <- forecast(myts_holt_winters_damped, h=1)
accuracy(myts_oneStep_holt_winters_damped)

##                  ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 2.9059 25.10059 18.74334 0.2366077 1.993423 0.3011601
##                    ACF1
## Training set -0.1394101

RMSE for both is very similar. Based on plot of damped model, we can see that model is trying to prevent over-forecast by the multiplicative model. I think that the damped method would be better when the horizon is longer.

Check that the residuals from the best method look like white noise.

checkresiduals(myts_holt_winters_damped)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt-Winters' multiplicative method
## Q* = 285.62, df = 7, p-value < 2.2e-16
## 
## Model df: 17.   Total lags used: 24

ACF plot shows that there are spikes that are outside the boundaries of white noise.

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 8 in Section 3.7?

myts_train <- window(myts, end = c(2010, 12))
myts_test <- window(myts, start = 2011)

myts_train_holt_winters <- hw(myts_train, damped = TRUE, seasonal = "multiplicative")
accuracy(myts_train_holt_winters, myts_test)

##                      ME     RMSE      MAE       MPE     MAPE      MASE
## Training set   4.089724 25.68057 18.92510  0.371369 2.057716 0.3068053
## Test set     -27.787782 41.08034 32.12435 -1.279976 1.487404 0.5207858
##                     ACF1 Theil's U
## Training set -0.05009924        NA
## Test set      0.13519074 0.3334222

myts_train_snaive <- snaive(myts_train)
accuracy(myts_train_snaive,myts_test)

##                    ME      RMSE       MAE      MPE     MAPE     MASE
## Training set 61.56787  72.20702  61.68438 6.388722 6.404105 1.000000
## Test set     97.44583 109.62545 100.02917 4.629852 4.751209 1.621629
##                   ACF1 Theil's U
## Training set 0.6018274        NA
## Test set     0.2686595 0.9036205

Yes, we can beat the snaive model with our current model. RMSE of snaive is 109 while for holt winters is 41.