Problem No. 7.1

Consider the pigs series—the number of pigs slaughtered in Victoria each month.

Use the ses() function in R to find the optimal values of \(\hat{\alpha}\) and \(\ell_0\), and generate forecasts for the next four months.

Load the data and create the model:

data(pigs)

m <- ses(pigs, h=4)
summary(m)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = pigs, h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.2971 
## 
##   Initial states:
##     l = 77260.0561 
## 
##   sigma:  10308.58
## 
##      AIC     AICc      BIC 
## 4462.955 4463.086 4472.665 
## 
## Error measures:
##                    ME    RMSE      MAE       MPE     MAPE      MASE
## Training set 385.8721 10253.6 7961.383 -0.922652 9.274016 0.7966249
##                    ACF1
## Training set 0.01282239
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Sep 1995       98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995       98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995       98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995       98816.41 83958.37 113674.4 76092.99 121539.8

The output tells us:

The smoothing parameter that minimizes the error (presumably sum of squared errors) between the model and the empirical data is \(\hat{\alpha} = 0.2971\)
The optimal initial level that minimizes the error is \(\ell_0 = 77,260\)
The root mean squared error is \(RMSE = 10,253\)
Forecast for next four months: \(98,816\) slaughtered pigs for each month

Compute a 95% prediction interval for the first forecast using \(\hat{y} \pm 1.96 s\) where \(s\) is the standard deviation of the residuals. Compare your interval with the interval produced by R.

First, calculate our estimate of \(\sigma\):

( sigma <- sd(resid(m)) )

## [1] 10273.69

The model’s calculation is somewhat different, at \(10,308.58\).

Now, calculate the 95 percent confidence interval:

mu <- m$mean[1] 
mu + c(-1, 1) * 1.96 * sigma

## [1]  78679.97 118952.84

approximates R’s closely: \((78611.97, 119020.8)\).

Note that if you use the model’s \(\hat{\sigma} = 10,308.58\) rather than my computed sigma, the confidence interval matches the model output exactly:

mu + c(-1, 1) * 1.96 * 10308.58

## [1]  78611.59 119021.22

Problem No. 7.5

Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.

Plot the series and discuss the main features of the data.

The books dataset consists of two daily time series of book sales in the store over the course of a thirty day period. Sales are broken out by paperback and hardback books.

The plot below shows both are on an upward trend. There does seem to be repeating patterns: Paperback alternates between ‘up’ periods and ‘down’ periods every 2–3 days, while hardcovers pattern is wider, maybe 8–9 days.

data(books)
autoplot(books, facet=TRUE)

First, let’s examine basic descriptive statistics about each series. The dispersion is greater for hardback sales than paperback:

sapply(books, sd, 1)

## Paperback Hardcover 
##  35.48054  40.30152

Both series have medians close to the means, indicating rough symmetry around the mean. This store seems to sell more hardcovers on average than softcovers.

summary(books)

##    Paperback       Hardcover    
##  Min.   :111.0   Min.   :128.0  
##  1st Qu.:167.2   1st Qu.:170.5  
##  Median :189.0   Median :200.5  
##  Mean   :186.4   Mean   :198.8  
##  3rd Qu.:207.2   3rd Qu.:222.0  
##  Max.   :247.0   Max.   :283.0

The two series are well correlated at 0.612, with an obviously linear relationship:

GGally::ggpairs(books)

Finally, examine the series’ autocorrelation. The first plot of paperbacks shows significant autocorrelation on the third lag. For the hardcover sales, the first and second lags are significant. It’s easy to see the pattern of 8–9 days in the first 9 lags as well.

ggAcf(books[,1])

ggAcf(books[,2])

Use the ses() function to forecast each series, and plot the forecasts.

Both models get the series partially right, but there are real discrepancies between the model and the data. It looks off.

m1_paper <- ses(books[,1], h=4)
m1_hard <- ses(books[,2], h=4)

autoplot(m1_hard) +
  autolayer(fitted(m1_hard), series='Modeled') +
  labs(x='Day', y='Hardback books sold')

autoplot(m1_paper) +
  autolayer(fitted(m1_paper), series='Modeled') +
  labs(x='Day', y='Paperback books sold')

Model evaluation:

	\(AIC\)	\(RMSE\)
\(M_1\) paperback	\(318.97\)	\(33.64\)
\(M_1\) hardback	\(315.85\)	\(31.93\)

Compute the RMSE values for the training data in each case.

The formula for root mean squared error is:

\[RMSE = \sqrt{ \frac{1}{n} \sum_{i}^n r_i^2 } = \sqrt{ \frac{1}{n} \sum_{i}^n (y_i - \hat{y}_i)^2 }\]

where \(y_i\) is the \(i\)-th observation of the variable being modeled \(y\), \(\hat{y}_i\) is the predicted value for observation \(i\), and the residual \(r_i = y_i - \hat{y}_i\) is the difference between the two. Calculate for each model:

sqrt(mean(resid(m1_paper)^2))

## [1] 33.63769

sqrt(mean(resid(m1_hard)^2))

## [1] 31.93101

Problem No. 7.6

We will continue with the daily sales of paperback and hardcover books in data set books.

Apply Holt’s linear method.

Holt’s linear method is appropriate if we determine the sales data is sufficiently trended:

m2_paper <- holt(books[,1], h=4)
m2_hard <- holt(books[,2], h=4)

autoplot(m2_paper)

autoplot(m2_hard)

We can immediately see the forecasts are now trended—the previous models predicted the same value for all four prediction periods.

Model evaluation:

	\(AIC\)	\(RMSE\)
\(M_2\) paperback	\(318.34\)	\(31.14\)
\(M_2\) hardback	\(310.21\)	\(27.19\)

Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question.

As before,

sqrt(mean(resid(m2_paper)^2))

## [1] 31.13692

sqrt(mean(resid(m2_hard)^2))

## [1] 27.19358

Compare the forecasts for the two series using both methods. Which do you think is best?

Putting the two models together, we see that the Holt linear method produces a slightly better model in terms of \(RMSE\). However, the \(AIC\), a measure that tries to estimate the out-of-sample error, has not changed substantially.

	\(AIC\)	\(RMSE\)
\(M_1\) paperback	\(318.97\)	\(33.64\)
\(M_2\) paperback	\(318.34\)	\(31.14\)

For the hardbacks, the Holt linear method produces more of a benefit, but still it’s not much.

	\(AIC\)	\(RMSE\)
\(M_1\) hardback	\(315.85\)	\(31.93\)
\(M_2\) hardback	\(310.21\)	\(27.19\)

Which time series model should we use? This is actually a tricky question.

The Holt linear model is superior to SES. However, the improvement is very slim, and arguably due to randomness. It could also be the case that the apparent trend in the data would disappear with a longer time series, or be reduced to seasonality. Additional data would be helpful; presumably another 30 days of data will be available in 30 days.

In practice, I would suggest we use Holt linear model, as its just as easy to implement in any case. The only exception is if the business side suspected the data is not actually trended. In any case, I would reasses as more data comes in.

Calculate a 95% prediction interval

As above,

m2_hard$mean[1] + c(-1, 1) * 1.96 * sd(resid(m2_hard))

## [1] 195.9640 304.3838

m2_paper$mean[1] + c(-1, 1) * 1.96 * sd(resid(m2_paper))

## [1] 147.8390 271.0945

Problem No. 7.8

Recall your retail time series data (from Exercise 3 in Section 2.10).

Load the data:

retaildata <- readxl::read_excel('~/Downloads/retail.xlsx', skip=1)
myts <- ts(retaildata[,55], frequency=12, start=c(1982, 4))

autoplot(myts)

Why is multiplicative seasonality necessary for this series?

In the plot of the time series above, we see the ‘amplitude’ of the seasonality starts small, and increases. By the end, it is several times larger than the beginning.

How should that be modeled? With an additive model, the change in ‘amplitude’ is necessarily constant. A multiplicative model, however, allows increasing amplitude as time goes on in a non-constant, multiplicative way.

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

Multiplicative Holt-Winters model, forecasting out a half-year:

m3 <- hw(myts, seasonal='multiplicative', h=26)

m4 <- hw(myts, seasonal='multiplicative', h=26, damped=TRUE)

autoplot(myts) +
  autolayer(m3, series='HW mult', PI=FALSE) +
  autolayer(m4, series='HW mult (dampened)', PI=FALSE) +
  guides(colour=guide_legend(title=element_blank())) +
  theme(legend.position='bottom')

The dampened forecast is ‘below’ the vanilla HW forecast. That is, it has dampened the trend.

When should dampening be used? When we expect a time series’ rate of growth to slow as time proceeds. Often, time series will not grow at high rates forever. One can adjust the dampening parameter between 0 and 1 to reflect the prior beliefs about how fast growth will decline in the future.

Compare the \(RMSE\) of the one-step forecasts from the two methods. Which do you prefer?

From the summary() output of the models:

	\(AIC\)	\(RMSE\)
\(M_3\) HW	\(3944.19\)	\(9.44\)
\(M_4\) HW dampened	\(3920.42\)	\(9.26\)

Both \(AIC\) and \(RMSE\) are improved in the dampened model—though not by much.

Check that the residuals from the best method look like white noise.

Examine a historgram of each models’ residuals:

checkresiduals(m3)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' multiplicative method
## Q* = 117.54, df = 8, p-value < 2.2e-16
## 
## Model df: 16.   Total lags used: 24

checkresiduals(m4)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt-Winters' multiplicative method
## Q* = 112.5, df = 7, p-value < 2.2e-16
## 
## Model df: 17.   Total lags used: 24

To my eyes, it actually looks like the residuals for plain HW have a better (more normal) distribution compared to the dampened model. The ACF bars appear more prominent in the dampened model as well.

Practically speaking, I would use generic HW for forecasting, unless I had some prior knowledge that would indicate dampening is useful. The improvement in \(AIC\) and \(RMSE\) is slight, potentially not real, and the \(M_3\) residuals look better.

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 8 in Section 3.7?

This data is definitely trended, and definitely requires a multiplicative model as the amplitude increases over time, so we will use Holt-Winters multiplicate model. We will have to determine if a dampened model is needed, however.

train <- window(myts, end=c(2010,12))
test <- window(myts, start=c(2011, 1))

m5 <- hw(train, seasonal='multiplicative')
m6 <- hw(train, seasonal='multiplicative',damped=TRUE)

accuracy(m5, test)

##                      ME      RMSE       MAE        MPE     MAPE      MASE
## Training set  0.2054577  9.098378  6.688305 -0.2799244 4.063285 0.6025826
## Test set     -9.6598045 15.617873 12.889352 -2.9079374 4.046896 1.1612658
##                     ACF1 Theil's U
## Training set -0.17128900        NA
## Test set      0.02156953 0.2187908

accuracy(m6, test)

##                    ME      RMSE      MAE       MPE     MAPE      MASE
## Training set 1.103818  9.004195 6.664112 0.5746516 4.024478 0.6004030
## Test set     6.184397 12.072202 9.005522 2.0933244 3.040537 0.8113522
##                    ACF1 Theil's U
## Training set -0.1164456        NA
## Test set     -0.0418478 0.1848372

My naive model from section 3.7 had an out-of-sample \(RMSE= 28.47\). Vanilla HW has an out-of-sample \(RMSE = 15.62\), while HW with dampening boasts \(RMSE = 12.07\). Thus the later model is the winner.

Take a look at the forecast:

autoplot(window(myts, end=c(2010, 12))) +
  autolayer(m6)

Problem No. 7.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

First, apply a Box Cox transformation to the data. We’ll need to find the \(\lambda\) value:

( lambda <- BoxCox.lambda(myts) )

## [1] 0.1116205

bc <- BoxCox(myts, lambda)

Second, apply STL and derive the seasonally adjusted data by combining the trend and remainder portions of the decomposed time series:

decomp <- stl(bc[,1], t.window=12, s.window='periodic', robust=TRUE)
myts_adj <- decomp$time.series[,2] + decomp$time.series[,3]
autoplot(myts_adj)

Finally, apply ETS on seasonally adjusted data:

m7 <- forecast(myts_adj)
autoplot(m7) + ggtitle('ETS on Box Cox + seasonally adjusted myts')

From the summary() output, this model has an \(AIC = 451.05\) and \(RMSE = 0.0886\). Compare to previous models:

	\(AIC\)	\(RMSE\)
\(M_3\) HW	\(3944.1\)	\(9.44\)
\(M_4\) HW dampened	\(3920.42\)	\(9.26\)
\(M_7\) ETS	\(451.05\)	\(0.09\)

The \(AIC\) strongly suggests that the ETS model on the seasonally adjusted Box Cox transformed data is superior to the other models.

DATA 624—Week No. 6

Ben Horvath

March 8, 2020

Problem No. 7.1

Problem No. 7.5

Problem No. 7.6

Problem No. 7.8

Problem No. 7.9