Exercises 7.1, 7.5,7.6, 7.7, 7.8 and 7.9 from the Hyndman online Forecasting book. The rpubs version of this work can be found here, and source/data can be found on github here.
#clear the workspace
rm(list = ls())
#load req's packages
library(fpp2)
library(knitr)
Consider the pigs series - the number of pigs slaughtered in Victoria each month.
First we’ll take a quick look at the data
plot(pigs)
tail(pigs,5)
## Apr May Jun Jul Aug
## 1995 84307 114896 106749 87892 100506
fc <- ses(pigs,h=4)
fc$model
## Simple exponential smoothing
##
## Call:
## ses(y = pigs, h = 4)
##
## Smoothing parameters:
## alpha = 0.2971
##
## Initial states:
## l = 77260.0561
##
## sigma: 10308.58
##
## AIC AICc BIC
## 4462.955 4463.086 4472.665
The optimal values for \(\alpha\) and \(\ell_0\) are 0.2971 and 77260.0561 respectively and can be seen above.
fc
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Sep 1995 98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995 98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995 98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995 98816.41 83958.37 113674.4 76092.99 121539.8
A simple point forcast is shown in the table above with a value of 98816.41 for the next 4 months.
my.ci95 <- c(l=fc$mean[1] - 1.96*sd(residuals(fc)),
u=fc$mean[1] + 1.96*sd(residuals(fc)))
#first forcast & 95th -> [1,2]
model.ci95 <- c(l = fc$lower[1,2],
u= fc$upper[1,2])
df <- data.frame(my.ci95, model.ci95)
row.names(df) <- c("Lower","Upper")
colnames(df) <- c("My CI","Model CI")
kable(df)
| My CI | Model CI | |
|---|---|---|
| Lower | 78679.97 | 78611.97 |
| Upper | 118952.84 | 119020.84 |
The intervals are similar, but not identical
Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.
autoplot(books)
Both series exhibit a positive trend over the period observed. The oscillations between periods are large and it appears as though there may be some kind of cyclicality (i.e. sales bounce between low and high values along the same general trajectory)
fc.pb <- ses(books[,1],h=4)
fc.hc <- ses(books[,2],h=4)
autoplot(books) +
autolayer(fc.pb, series="Paperback", PI=F) +
autolayer(fc.hc, series="Hardcover", PI=F)
rmse <- data.frame(c(paperback = accuracy(fc.pb)[2] ,
hardcover =accuracy(fc.hc)[2] ))
colnames(rmse) <- c("RMSE")
kable(rmse)
| RMSE | |
|---|---|
| paperback | 33.63769 |
| hardcover | 31.93101 |
We will continue with the daily sales of paperback and hardcover books in data set books
fc.pb.holt <- holt(books[,1],h=4)
fc.hc.holt <- holt(books[,2],h=4)
autoplot(books) +
autolayer(fc.pb.holt, series="Paperback", PI=F) +
autolayer(fc.hc.holt, series="Hardcover", PI=F)
rmse.holt <- data.frame(c(paperback = accuracy(fc.pb.holt)[2] ,
hardcover =accuracy(fc.hc.holt)[2] ))
rmse <- data.frame(rmse,rmse.holt)
colnames(rmse) <- c("SES RMSE", "HOLT RMSE")
kable(rmse)
| SES RMSE | HOLT RMSE | |
|---|---|---|
| paperback | 33.63769 | 31.13692 |
| hardcover | 31.93101 | 27.19358 |
compare <- data.frame(paperback = c(fc.pb$mean[1],fc.pb.holt$mean[1]),
hardcover = c(fc.hc$mean[1],fc.hc.holt$mean[1]))
row.names(compare) <- c("SES","HOLT")
kable(compare)
| paperback | hardcover | |
|---|---|---|
| SES | 207.1097 | 239.5601 |
| HOLT | 209.4668 | 250.1739 |
We see a few things here: * By comparing the plots on 7.6-A vs 7.5-B that the holt forcast retains the positive drift, which seems like an important component here. The point estimates in the table above show this numerically. * The RMSE is also lower for Holt method (7.6-B) in the case of both paperbacks and hard covers.
In this case, potentially because of the positive trend in the data, the Holt method appears to be a better choice.
#compute RMSE CIs for pb
my.ci95.pb <- c(l=fc.pb.holt$mean[1] - 1.96*accuracy(fc.pb.holt)[2],
u=fc.pb.holt$mean[1] + 1.96*accuracy(fc.pb.holt)[2])
ses.pb <- c(l= fc.pb$lower[1,2],
h= fc.pb$upper[1,2])
h.pb <- c(l= fc.pb.holt$lower[1,2],
h= fc.pb.holt$upper[1,2])
#compute RMSE CIs for hc
my.ci95.hc <- c(l=fc.hc.holt$mean[1] - 1.96*accuracy(fc.hc.holt)[2],
u=fc.hc.holt$mean[1] + 1.96*accuracy(fc.hc.holt)[2])
ses.hc <- c(l= fc.hc$lower[1,2],
h= fc.hc$upper[1,2])
h.hc <- c(l= fc.hc.holt$lower[1,2],
h= fc.hc.holt$upper[1,2])
#combine the data
paperback <- data.frame(rbind(my.ci95.pb,ses.pb,h.pb))
hardcover <- data.frame(rbind(my.ci95.hc,ses.hc,h.hc))
rnames <- c("RMSE","SES","HOLT")
cnames <- c("Lower","Upper")
row.names(paperback) <- rnames
colnames(paperback) <- cnames
row.names(hardcover) <- rnames
colnames(hardcover) <- cnames
kable(paperback,caption="Paperbacks")
| Lower | Upper | |
|---|---|---|
| RMSE | 148.4384 | 270.4951 |
| SES | 138.8670 | 275.3523 |
| HOLT | 143.9130 | 275.0205 |
kable(hardcover,caption="Hardcover")
| Lower | Upper | |
|---|---|---|
| RMSE | 196.8745 | 303.4733 |
| SES | 174.7799 | 304.3403 |
| HOLT | 192.9222 | 307.4256 |
For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900-1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.
[Hint: use h=100 when calling holt() so you can clearly see the differences between the various options when plotting the forecasts.]
Which model gives the best RMSE?
#plain vanilla
fc <- holt(eggs,h=100)
autoplot(eggs) +
autolayer(fc,series="Holt's Method",PI=T)
fc$model
## Holt's method
##
## Call:
## holt(y = eggs, h = 100)
##
## Smoothing parameters:
## alpha = 0.8124
## beta = 1e-04
##
## Initial states:
## l = 314.7232
## b = -2.7222
##
## sigma: 27.1665
##
## AIC AICc BIC
## 1053.755 1054.437 1066.472
fc.acc <- accuracy(fc)
fc.acc
## ME RMSE MAE MPE MAPE MASE
## Training set 0.04499087 26.58219 19.18491 -1.142201 9.653791 0.9463626
## ACF1
## Training set 0.01348202
#damped
fc.d <- holt(eggs, h = 100, damped = T)
autoplot(eggs) +
autolayer(fc.d, series="Holt's Method (Damped)", PI=T)
fc.d$model
## Damped Holt's method
##
## Call:
## holt(y = eggs, h = 100, damped = T)
##
## Smoothing parameters:
## alpha = 0.8462
## beta = 0.004
## phi = 0.8
##
## Initial states:
## l = 276.9842
## b = 4.9966
##
## sigma: 27.2755
##
## AIC AICc BIC
## 1055.458 1056.423 1070.718
fc.d.acc <- accuracy(fc.d)
fc.d.acc
## ME RMSE MAE MPE MAPE MASE
## Training set -2.891496 26.54019 19.2795 -2.907633 10.01894 0.9510287
## ACF1
## Training set -0.003195358
#bc
fc.bc <- holt(eggs, h = 100, lambda = "auto")
autoplot(eggs) +
autolayer(fc.bc, series="Holt's Method w/BoxCox", PI=TRUE)
fc.bc$model
## Holt's method
##
## Call:
## holt(y = eggs, h = 100, lambda = "auto")
##
## Box-Cox transformation: lambda= 0.3956
##
## Smoothing parameters:
## alpha = 0.809
## beta = 1e-04
##
## Initial states:
## l = 21.0322
## b = -0.1144
##
## sigma: 1.0549
##
## AIC AICc BIC
## 443.0310 443.7128 455.7475
fc.bc.acc <- accuracy(fc.bc)
fc.bc.acc
## ME RMSE MAE MPE MAPE MASE
## Training set 0.7736844 26.39376 18.96387 -1.072416 9.620095 0.9354593
## ACF1
## Training set 0.03887152
#biasAdj
fc.ba <- holt(eggs, h = 100, lambda = "auto",biasadj = T)
autoplot(eggs) +
autolayer(fc.ba, series="Holt's Method w/BoxCox", PI=TRUE)
fc.ba$model
## Holt's method
##
## Call:
## holt(y = eggs, h = 100, lambda = "auto", biasadj = T)
##
## Box-Cox transformation: lambda= 0.3956
##
## Smoothing parameters:
## alpha = 0.809
## beta = 1e-04
##
## Initial states:
## l = 21.0322
## b = -0.1144
##
## sigma: 1.0549
##
## AIC AICc BIC
## 443.0310 443.7128 455.7475
fc.ba.acc <- accuracy(fc.ba)
fc.ba.acc
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.2015298 26.38689 18.99362 -1.63043 9.713172 0.9369265 0.0383996
#exponential
fc.ex <- holt(eggs, h = 100, exponential=T)
autoplot(eggs) +
autolayer(fc.ex, series="Holt's Method w/BoxCox", PI=TRUE)
fc.ex$model
## Holt's method with exponential trend
##
## Call:
## holt(y = eggs, h = 100, exponential = T)
##
## Smoothing parameters:
## alpha = 0.819
## beta = 1e-04
##
## Initial states:
## l = 316.8174
## b = 0.9851
##
## sigma: 0.1393
##
## AIC AICc BIC
## 1047.264 1047.946 1059.980
fc.ex.acc <- accuracy(fc.ex)
fc.ex.acc
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.4918791 26.49795 19.29399 -1.263235 9.766049 0.9517436 0.0103908
methods.used <- c("Holt","Damped","BoxCox","BiasAdjutsed BC","Exponential")
df <- data.frame(fc.acc[2],
fc.d.acc[2],
fc.bc.acc[2],
fc.bc.acc[2],
fc.ex.acc[2])
colnames(df) <- methods.used
df <- t(df)
colnames(df) <- c("RMSE")
kable(df)
| RMSE | |
|---|---|
| Holt | 26.58219 |
| Damped | 26.54019 |
| BoxCox | 26.39376 |
| BiasAdjutsed BC | 26.39376 |
| Exponential | 26.49795 |
Of the methods examined, the Box Cox variations appeared to have the best RMSE, however the difference was relatively samll vs other methods.
Recall your retail time series data (from Exercise 3 in Section 2.10)
#borrowed code from hw
temp_file <- tempfile(fileext = ".xlsx")
download.file(url = "https://github.com/plb2018/DATA624/raw/master/Homework1/retail.xlsx",
destfile = temp_file,
mode = "wb",
quiet = TRUE)
retaildata <- readxl::read_excel(temp_file,skip=1)
aussie.retail <- ts(retaildata[,"A3349388W"],
frequency=12, start=c(1982,4))
autoplot(aussie.retail)
The multiplicative method is required here because the seasonal variations appear to change in proportion with the level of the series. Essentially, a multiplicative method will retain the relative meaning of the season, regardless of series level, whereas an arithmetic method may not do so.
fc.hw.mult <- hw(aussie.retail, seasonal = "multiplicative")
autoplot(fc.hw.mult)
fc.hw.mult.d <- hw(aussie.retail, seasonal = "multiplicative",damped=T)
autoplot(fc.hw.mult.d)
The damped method appears to greatly increase the model dispersion in this case. Is this possibly because it appears to be ignoring relatively persistent trend data?
df.rmse <- data.frame(accuracy(fc.hw.mult)[2],accuracy(fc.hw.mult.d)[2] )
rownames(df.rmse) <- c("RMSE")
colnames(df.rmse) <- c("Multiplicative","Damped")
kable(df.rmse)
| Multiplicative | Damped | |
|---|---|---|
| RMSE | 16.79301 | 16.92689 |
The non-damped is slightly higher than the damped method and this preferred.
checkresiduals(fc.hw.mult)
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' multiplicative method
## Q* = 70.814, df = 8, p-value = 3.383e-12
##
## Model df: 16. Total lags used: 24
It looks reasonable, however, there are few things of note in the ACF plot. There appears to be a clear season in the ACF and a few values pushing into values that are higher than we would like to see. This suggests that there may be room for improvement in the seasonality model.
retail.train <- ts(retaildata[,"A3349388W"],
frequency=12, start=c(1982,4),end=c(2010,12))
retail.test <- ts(retaildata[,"A3349388W"],
frequency=12, start=c(2011))
fc.hw <- hw(retail.train, seasonal = "multiplicative")
fc.hw.acc <- accuracy(fc.hw,retail.test)
fc.hw.acc
## ME RMSE MAE MPE MAPE
## Training set 0.596672 15.84002 11.62637 0.05568738 2.087244
## Test set -1104.047174 1106.46287 1104.04717 -523.38285803 523.382858
## MASE ACF1 Theil's U
## Training set 0.2781349 0.1326599 NA
## Test set 26.4118528 0.3476486 92.42764
fc.hw.d <- hw(retail.train, seasonal = "multiplicative",damped = T)
fc.hw.d.acc <- accuracy(fc.hw.d,retail.test)
fc.hw.d.acc
## ME RMSE MAE MPE MAPE
## Training set 1.936942 14.70277 11.21486 0.278631 2.089594
## Test set -1042.329928 1043.68311 1042.32993 -494.684199 494.684199
## MASE ACF1 Theil's U
## Training set 0.2682903 0.02717146 NA
## Test set 24.9354061 0.03536186 87.039
fc.naive <- snaive(retail.train)
fc.naive.acc <- accuracy(fc.naive,retail.test)
fc.naive.acc
## ME RMSE MAE MPE MAPE MASE
## Training set 35.05886 53.07719 41.8012 5.913718 7.197148 1.00000
## Test set -967.65833 969.57483 967.6583 -459.074166 459.074166 23.14906
## ACF1 Theil's U
## Training set 0.88558266 NA
## Test set 0.07802661 80.8652
df.rmse <- data.frame(fc.hw.acc[2,2],fc.hw.d.acc[2,2],fc.naive.acc[2,2] )
rownames(df.rmse) <- c("RMSE")
colnames(df.rmse) <- c("Multiplicative","Damped","Naive")
kable(t(df.rmse))
| RMSE | |
|---|---|
| Multiplicative | 1106.4629 |
| Damped | 1043.6831 |
| Naive | 969.5748 |
I tried both the damped and non-damped method and was unable to beat the naive approach (in terms of RMSE) with either method.
For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?
fc.stl <- stlf(retail.train,method="ets", lambda = "auto")
fc.stl.acc <- accuracy(fc.stl, retail.test)
fc.stl.acc
## ME RMSE MAE MPE MAPE
## Training set 9.427513e-02 12.73408 9.521244 -0.01596041 1.696864
## Test set -1.077285e+03 1079.37425 1077.284587 -510.65117188 510.651172
## MASE ACF1 Theil's U
## Training set 0.2277744 -0.05476848 NA
## Test set 25.7716180 0.33920226 90.02523
df.rmse <- data.frame(fc.hw.acc[2,2],fc.hw.d.acc[2,2],fc.naive.acc[2,2],fc.stl.acc[2,2] )
rownames(df.rmse) <- c("RMSE")
colnames(df.rmse) <- c("Multiplicative","Damped","Naive","STL +ETS")
kable(t(df.rmse))
| RMSE | |
|---|---|
| Multiplicative | 1106.4629 |
| Damped | 1043.6831 |
| Naive | 969.5748 |
| STL +ETS | 1079.3743 |
After applying STL +BoxCox and ETS (all passed as args to stlf()) we see limited improvement from this new model. It outperforms the “multiplicative” model, but underperforms the damped and naive models.