Data 624 - Homework #5

Exercises 7.1, 7.5,7.6, 7.7, 7.8 and 7.9 from the Hyndman online Forecasting book. The rpubs version of this work can be found here, and source/data can be found on github here.

#clear the workspace
rm(list = ls())

#load req's packages
library(fpp2)
library(knitr)

Question 7.1

Consider the pigs series - the number of pigs slaughtered in Victoria each month.

First we’ll take a quick look at the data

plot(pigs)

tail(pigs,5)

##         Apr    May    Jun    Jul    Aug
## 1995  84307 114896 106749  87892 100506

A) Use the ses() function in R to find the optimal values of \(\alpha\) and \(\ell_0\), and generate forecasts for the next four months.

fc <- ses(pigs,h=4)
fc$model

## Simple exponential smoothing 
## 
## Call:
##  ses(y = pigs, h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.2971 
## 
##   Initial states:
##     l = 77260.0561 
## 
##   sigma:  10308.58
## 
##      AIC     AICc      BIC 
## 4462.955 4463.086 4472.665

The optimal values for \(\alpha\) and \(\ell_0\) are 0.2971 and 77260.0561 respectively and can be seen above.

fc

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Sep 1995       98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995       98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995       98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995       98816.41 83958.37 113674.4 76092.99 121539.8

A simple point forcast is shown in the table above with a value of 98816.41 for the next 4 months.

B) Compute a 95% prediction interval for the first forecast using \(\hat{y}\pm1.96s\) where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

my.ci95 <- c(l=fc$mean[1] - 1.96*sd(residuals(fc)),
           u=fc$mean[1] + 1.96*sd(residuals(fc)))

#first forcast & 95th -> [1,2]
model.ci95 <- c(l = fc$lower[1,2],
                u=  fc$upper[1,2])


df <- data.frame(my.ci95, model.ci95)

row.names(df) <- c("Lower","Upper")
colnames(df) <- c("My CI","Model CI")

kable(df)

	My CI	Model CI
Lower	78679.97	78611.97
Upper	118952.84	119020.84

The intervals are similar, but not identical

Question 7.5

Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.

A) Plot the series and discuss the main features of the data.

autoplot(books)

Both series exhibit a positive trend over the period observed. The oscillations between periods are large and it appears as though there may be some kind of cyclicality (i.e. sales bounce between low and high values along the same general trajectory)

B) Use the ses() function to forecast each series, and plot the forecasts.

fc.pb <- ses(books[,1],h=4)
fc.hc <- ses(books[,2],h=4)

autoplot(books) +
  autolayer(fc.pb, series="Paperback", PI=F) +
  autolayer(fc.hc, series="Hardcover", PI=F)

C) Compute the RMSE values for the training data in each case.

rmse <- data.frame(c(paperback = accuracy(fc.pb)[2] ,
          hardcover =accuracy(fc.hc)[2] ))




colnames(rmse) <- c("RMSE")

kable(rmse)

	RMSE
paperback	33.63769
hardcover	31.93101

Question 7.6

We will continue with the daily sales of paperback and hardcover books in data set books

A) Apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.

fc.pb.holt <- holt(books[,1],h=4)
fc.hc.holt <- holt(books[,2],h=4)

autoplot(books) +
  autolayer(fc.pb.holt, series="Paperback", PI=F) +
  autolayer(fc.hc.holt, series="Hardcover", PI=F)

B) Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.

rmse.holt <- data.frame(c(paperback = accuracy(fc.pb.holt)[2] ,
          hardcover =accuracy(fc.hc.holt)[2] ))


rmse <- data.frame(rmse,rmse.holt)

colnames(rmse) <- c("SES RMSE", "HOLT RMSE")

kable(rmse)

	SES RMSE	HOLT RMSE
paperback	33.63769	31.13692
hardcover	31.93101	27.19358

C) Compare the forecasts for the two series using both methods. Which do you think is best?

compare <- data.frame(paperback = c(fc.pb$mean[1],fc.pb.holt$mean[1]), 
                      hardcover = c(fc.hc$mean[1],fc.hc.holt$mean[1]))

row.names(compare) <- c("SES","HOLT")

kable(compare)

	paperback	hardcover
SES	207.1097	239.5601
HOLT	209.4668	250.1739

We see a few things here: * By comparing the plots on 7.6-A vs 7.5-B that the holt forcast retains the positive drift, which seems like an important component here. The point estimates in the table above show this numerically. * The RMSE is also lower for Holt method (7.6-B) in the case of both paperbacks and hard covers.

In this case, potentially because of the positive trend in the data, the Holt method appears to be a better choice.

D) Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.

#compute RMSE CIs for pb
my.ci95.pb <- c(l=fc.pb.holt$mean[1] - 1.96*accuracy(fc.pb.holt)[2],
           u=fc.pb.holt$mean[1] + 1.96*accuracy(fc.pb.holt)[2])

ses.pb <- c(l= fc.pb$lower[1,2],
            h= fc.pb$upper[1,2])

h.pb <- c(l= fc.pb.holt$lower[1,2],
            h= fc.pb.holt$upper[1,2]) 

#compute RMSE CIs for hc
my.ci95.hc <- c(l=fc.hc.holt$mean[1] - 1.96*accuracy(fc.hc.holt)[2],
           u=fc.hc.holt$mean[1] + 1.96*accuracy(fc.hc.holt)[2])

ses.hc <- c(l= fc.hc$lower[1,2],
            h= fc.hc$upper[1,2])

h.hc <- c(l= fc.hc.holt$lower[1,2],
            h= fc.hc.holt$upper[1,2]) 

#combine the data
paperback <- data.frame(rbind(my.ci95.pb,ses.pb,h.pb))
hardcover <- data.frame(rbind(my.ci95.hc,ses.hc,h.hc))

rnames <-  c("RMSE","SES","HOLT")
cnames <-  c("Lower","Upper")

row.names(paperback) <- rnames
colnames(paperback) <- cnames

row.names(hardcover) <- rnames
colnames(hardcover) <- cnames

kable(paperback,caption="Paperbacks")

Paperbacks
	Lower	Upper
RMSE	148.4384	270.4951
SES	138.8670	275.3523
HOLT	143.9130	275.0205

kable(hardcover,caption="Hardcover")

Hardcover
	Lower	Upper
RMSE	196.8745	303.4733
SES	174.7799	304.3403
HOLT	192.9222	307.4256

Question 7.7

For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900-1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.

[Hint: use h=100 when calling holt() so you can clearly see the differences between the various options when plotting the forecasts.]

Which model gives the best RMSE?

#plain vanilla

fc <- holt(eggs,h=100)
autoplot(eggs) + 
  autolayer(fc,series="Holt's Method",PI=T)

fc$model

## Holt's method 
## 
## Call:
##  holt(y = eggs, h = 100) 
## 
##   Smoothing parameters:
##     alpha = 0.8124 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 314.7232 
##     b = -2.7222 
## 
##   sigma:  27.1665
## 
##      AIC     AICc      BIC 
## 1053.755 1054.437 1066.472

fc.acc <- accuracy(fc)

fc.acc

##                      ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.04499087 26.58219 19.18491 -1.142201 9.653791 0.9463626
##                    ACF1
## Training set 0.01348202

#damped

fc.d <- holt(eggs, h = 100, damped = T)
autoplot(eggs) +
  autolayer(fc.d, series="Holt's Method (Damped)", PI=T)

fc.d$model

## Damped Holt's method 
## 
## Call:
##  holt(y = eggs, h = 100, damped = T) 
## 
##   Smoothing parameters:
##     alpha = 0.8462 
##     beta  = 0.004 
##     phi   = 0.8 
## 
##   Initial states:
##     l = 276.9842 
##     b = 4.9966 
## 
##   sigma:  27.2755
## 
##      AIC     AICc      BIC 
## 1055.458 1056.423 1070.718

fc.d.acc <- accuracy(fc.d)

fc.d.acc

##                     ME     RMSE     MAE       MPE     MAPE      MASE
## Training set -2.891496 26.54019 19.2795 -2.907633 10.01894 0.9510287
##                      ACF1
## Training set -0.003195358

#bc
fc.bc <- holt(eggs, h = 100, lambda = "auto")
autoplot(eggs) +
  autolayer(fc.bc, series="Holt's Method w/BoxCox", PI=TRUE)

fc.bc$model

## Holt's method 
## 
## Call:
##  holt(y = eggs, h = 100, lambda = "auto") 
## 
##   Box-Cox transformation: lambda= 0.3956 
## 
##   Smoothing parameters:
##     alpha = 0.809 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 21.0322 
##     b = -0.1144 
## 
##   sigma:  1.0549
## 
##      AIC     AICc      BIC 
## 443.0310 443.7128 455.7475

fc.bc.acc <- accuracy(fc.bc)

fc.bc.acc

##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.7736844 26.39376 18.96387 -1.072416 9.620095 0.9354593
##                    ACF1
## Training set 0.03887152

#biasAdj
fc.ba <- holt(eggs, h = 100, lambda = "auto",biasadj = T)
autoplot(eggs) +
  autolayer(fc.ba, series="Holt's Method w/BoxCox", PI=TRUE)

fc.ba$model

## Holt's method 
## 
## Call:
##  holt(y = eggs, h = 100, lambda = "auto", biasadj = T) 
## 
##   Box-Cox transformation: lambda= 0.3956 
## 
##   Smoothing parameters:
##     alpha = 0.809 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 21.0322 
##     b = -0.1144 
## 
##   sigma:  1.0549
## 
##      AIC     AICc      BIC 
## 443.0310 443.7128 455.7475

fc.ba.acc <- accuracy(fc.ba)

fc.ba.acc

##                      ME     RMSE      MAE      MPE     MAPE      MASE      ACF1
## Training set -0.2015298 26.38689 18.99362 -1.63043 9.713172 0.9369265 0.0383996

#exponential
fc.ex <- holt(eggs, h = 100, exponential=T)
autoplot(eggs) +
  autolayer(fc.ex, series="Holt's Method w/BoxCox", PI=TRUE)

fc.ex$model

## Holt's method with exponential trend 
## 
## Call:
##  holt(y = eggs, h = 100, exponential = T) 
## 
##   Smoothing parameters:
##     alpha = 0.819 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 316.8174 
##     b = 0.9851 
## 
##   sigma:  0.1393
## 
##      AIC     AICc      BIC 
## 1047.264 1047.946 1059.980

fc.ex.acc <- accuracy(fc.ex)

fc.ex.acc

##                     ME     RMSE      MAE       MPE     MAPE      MASE      ACF1
## Training set 0.4918791 26.49795 19.29399 -1.263235 9.766049 0.9517436 0.0103908

methods.used <- c("Holt","Damped","BoxCox","BiasAdjutsed BC","Exponential")

df <- data.frame(fc.acc[2],
                 fc.d.acc[2],
                 fc.bc.acc[2],
                 fc.bc.acc[2],
                 fc.ex.acc[2])

colnames(df) <- methods.used
df <- t(df)
colnames(df) <- c("RMSE")

kable(df)

	RMSE
Holt	26.58219
Damped	26.54019
BoxCox	26.39376
BiasAdjutsed BC	26.39376
Exponential	26.49795

Of the methods examined, the Box Cox variations appeared to have the best RMSE, however the difference was relatively samll vs other methods.

Question 7.8

Recall your retail time series data (from Exercise 3 in Section 2.10)

#borrowed code from hw
temp_file <- tempfile(fileext = ".xlsx")

download.file(url = "https://github.com/plb2018/DATA624/raw/master/Homework1/retail.xlsx", 
              destfile = temp_file, 
              mode = "wb", 
              quiet = TRUE)

retaildata <- readxl::read_excel(temp_file,skip=1)

aussie.retail <- ts(retaildata[,"A3349388W"],
  frequency=12, start=c(1982,4))

autoplot(aussie.retail)

A) Why is multiplicative seasonality necessary for this series?

The multiplicative method is required here because the seasonal variations appear to change in proportion with the level of the series. Essentially, a multiplicative method will retain the relative meaning of the season, regardless of series level, whereas an arithmetic method may not do so.

B)Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

fc.hw.mult <- hw(aussie.retail, seasonal = "multiplicative")
autoplot(fc.hw.mult)

fc.hw.mult.d <- hw(aussie.retail, seasonal = "multiplicative",damped=T)
autoplot(fc.hw.mult.d)

The damped method appears to greatly increase the model dispersion in this case. Is this possibly because it appears to be ignoring relatively persistent trend data?

C) Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

df.rmse <- data.frame(accuracy(fc.hw.mult)[2],accuracy(fc.hw.mult.d)[2] )

rownames(df.rmse) <- c("RMSE")
colnames(df.rmse) <- c("Multiplicative","Damped")

kable(df.rmse)

	Multiplicative	Damped
RMSE	16.79301	16.92689

The non-damped is slightly higher than the damped method and this preferred.

D) Check that the residuals from the best method look like white noise.

checkresiduals(fc.hw.mult)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' multiplicative method
## Q* = 70.814, df = 8, p-value = 3.383e-12
## 
## Model df: 16.   Total lags used: 24

It looks reasonable, however, there are few things of note in the ACF plot. There appears to be a clear season in the ACF and a few values pushing into values that are higher than we would like to see. This suggests that there may be room for improvement in the seasonality model.

E) Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naive approach from Exercise 8 in Section 3.7?

retail.train <- ts(retaildata[,"A3349388W"],
  frequency=12, start=c(1982,4),end=c(2010,12))

retail.test <- ts(retaildata[,"A3349388W"],
  frequency=12, start=c(2011))

fc.hw <- hw(retail.train, seasonal = "multiplicative")
fc.hw.acc <- accuracy(fc.hw,retail.test)
fc.hw.acc

##                        ME       RMSE        MAE           MPE       MAPE
## Training set     0.596672   15.84002   11.62637    0.05568738   2.087244
## Test set     -1104.047174 1106.46287 1104.04717 -523.38285803 523.382858
##                    MASE      ACF1 Theil's U
## Training set  0.2781349 0.1326599        NA
## Test set     26.4118528 0.3476486  92.42764

fc.hw.d <- hw(retail.train, seasonal = "multiplicative",damped = T)
fc.hw.d.acc <- accuracy(fc.hw.d,retail.test)
fc.hw.d.acc

##                        ME       RMSE        MAE         MPE       MAPE
## Training set     1.936942   14.70277   11.21486    0.278631   2.089594
## Test set     -1042.329928 1043.68311 1042.32993 -494.684199 494.684199
##                    MASE       ACF1 Theil's U
## Training set  0.2682903 0.02717146        NA
## Test set     24.9354061 0.03536186    87.039

fc.naive <- snaive(retail.train)
fc.naive.acc <- accuracy(fc.naive,retail.test)
fc.naive.acc

##                      ME      RMSE      MAE         MPE       MAPE     MASE
## Training set   35.05886  53.07719  41.8012    5.913718   7.197148  1.00000
## Test set     -967.65833 969.57483 967.6583 -459.074166 459.074166 23.14906
##                    ACF1 Theil's U
## Training set 0.88558266        NA
## Test set     0.07802661   80.8652

df.rmse <- data.frame(fc.hw.acc[2,2],fc.hw.d.acc[2,2],fc.naive.acc[2,2] )

rownames(df.rmse) <- c("RMSE")
colnames(df.rmse) <- c("Multiplicative","Damped","Naive")

kable(t(df.rmse))

	RMSE
Multiplicative	1106.4629
Damped	1043.6831
Naive	969.5748

I tried both the damped and non-damped method and was unable to beat the naive approach (in terms of RMSE) with either method.

Question 7.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

fc.stl <- stlf(retail.train,method="ets", lambda = "auto")
fc.stl.acc <- accuracy(fc.stl, retail.test)
fc.stl.acc

##                         ME       RMSE         MAE           MPE       MAPE
## Training set  9.427513e-02   12.73408    9.521244   -0.01596041   1.696864
## Test set     -1.077285e+03 1079.37425 1077.284587 -510.65117188 510.651172
##                    MASE        ACF1 Theil's U
## Training set  0.2277744 -0.05476848        NA
## Test set     25.7716180  0.33920226  90.02523

df.rmse <- data.frame(fc.hw.acc[2,2],fc.hw.d.acc[2,2],fc.naive.acc[2,2],fc.stl.acc[2,2] )

rownames(df.rmse) <- c("RMSE")
colnames(df.rmse) <- c("Multiplicative","Damped","Naive","STL +ETS")

kable(t(df.rmse))

	RMSE
Multiplicative	1106.4629
Damped	1043.6831
Naive	969.5748
STL +ETS	1079.3743

After applying STL +BoxCox and ETS (all passed as args to stlf()) we see limited improvement from this new model. It outperforms the “multiplicative” model, but underperforms the damped and naive models.