A. Loading Packages
## Loading required package: ggplot2
## Loading required package: forecast
## Loading required package: fma
## Loading required package: expsmooth
Consider the pigs series - the number of pigs slaughtered in Victoria each month (Jan 1980 - Aug 1995).
tsdisplay(pigs)
a. Use the ses() function in R to find the optimal values of and alpha and lambda-naught, and generate forecasts for the next four months.
fc.pigs <- ses(pigs, h=4)
print(fc.pigs$model)
## Simple exponential smoothing
##
## Call:
## ses(y = pigs, h = 4)
##
## Smoothing parameters:
## alpha = 0.2971
##
## Initial states:
## l = 77260.0561
##
## sigma: 10308.58
##
## AIC AICc BIC
## 4462.955 4463.086 4472.665
Checking the SES model parameters, we note alpha (smoothing paramter) of 0.2971 and lambda-naught (initial value) of 77260.0561.
autoplot(fc.pigs) +
forecast::autolayer(fitted(fc.pigs), series="Fitted") +
ylab("Slaughtered Pigs/month") + xlab("Year")
b. Compute a 95% prediction interval for the first forecast using y-hat = +- 1.96, where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.
m.pigs <- mean(window(pigs, end=c(1995,4)))
print(paste0("The mean of the pigs values calculated from the data set is: ", round(m.pigs,3)))
## [1] "The mean of the pigs values calculated from the data set is: 90382.391"
print(paste0("The mean of the pigs values given by ses() model is: ", round(fc.pigs$mean[1],3)))
## [1] "The mean of the pigs values given by ses() model is: 98816.406"
s.pigs <- sd(fc.pigs$model$residuals)
print(paste0("The standard deviation of the residuals calculated from residuals is: ", round(s.pigs,3)))
## [1] "The standard deviation of the residuals calculated from residuals is: 10273.693"
print(paste0("The standard deviation of the residuals given by ses() model is: ", round(fc.pigs$model$sigma2^0.5,3)))
## [1] "The standard deviation of the residuals given by ses() model is: 10308.576"
The ses() model gives the 95% confidence interval for the first forecasted value as: [78611.97, 119020.8], verified below.
fc.pigs[c('upper','lower')]
## $upper
## 80% 95%
## Sep 1995 112027.4 119020.8
## Oct 1995 112598.3 119894.0
## Nov 1995 113146.5 120732.4
## Dec 1995 113674.4 121539.8
##
## $lower
## 80% 95%
## Sep 1995 85605.43 78611.97
## Oct 1995 85034.52 77738.83
## Nov 1995 84486.34 76900.46
## Dec 1995 83958.37 76092.99
fc.pigs$mean[1] - 1.96 * fc.pigs$model$sigma2^0.5
## [1] 78611.6
fc.pigs$mean[1] + 1.96 * fc.pigs$model$sigma2^0.5
## [1] 119021.2
Calculating the confidence interval by hand gives very different results, with a confidence interval of [70245.95, 110518.8].
m.pigs - 1.96 * s.pigs
## [1] 70245.95
m.pigs + 1.96 * s.pigs
## [1] 110518.8
The difference in these values can likely accounted for by the fact that the mean for the ses() model is also weight-adjusted according to the smoothing parameter.
Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.
a. Plot the series and discuss the main features of the data.
The first column gives daily sales of paperbacks, while the second is daily sales of hardcovers, for one month only.
data(books)
pb <- window(books[,1], end=26)
hc <- window(books[,2], end=26)
checkresiduals(books[,1])
## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.
checkresiduals(books[,2])
## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.
Applying the ets() function to both the paperback and hardcover time series, we note that the model gives the best approximations as follows: paperback: ETS(A,N,N) with additive error term, no trend nor seasonality hardcover: ETS(M,A,N) with multiplicative error term, additive trend and no seasonality
The trend in the hardcover, but not the paperback, data is supported by the autocorrelation function, showing significance up to two lags.
ets(books[,1])
## ETS(A,N,N)
##
## Call:
## ets(y = books[, 1])
##
## Smoothing parameters:
## alpha = 0.1684
##
## Initial states:
## l = 170.8464
##
## sigma: 34.8183
##
## AIC AICc BIC
## 318.9747 319.8978 323.1783
ets(books[,2])
## ETS(M,A,N)
##
## Call:
## ets(y = books[, 2])
##
## Smoothing parameters:
## alpha = 1e-04
## beta = 1e-04
##
## Initial states:
## l = 149.3465
## b = 3.2554
##
## sigma: 0.1462
##
## AIC AICc BIC
## 309.583 312.083 316.589
b. Use the ses() function to forecast each series, and plot the forecasts.
fc.pb.ses <- ses(pb, h=4)
autoplot(fc.pb.ses) +
forecast::autolayer(fitted(fc.pb.ses), series="Fitted") +
ylab("Daily Paperback sales") + xlab("Day of Month")
fc.hc.ses <- ses(hc, h=4)
autoplot(fc.hc.ses) +
forecast::autolayer(fitted(fc.hc.ses), series="Fitted") +
ylab("Daily Paperback sales") + xlab("Day of Month")
c. Compute the RMSE values for the training data in each case.
accuracy(fc.pb.ses,books[,1])
## ME RMSE MAE MPE MAPE MASE
## Training set 5.740754 34.07320 27.41353 -0.4571133 15.81877 0.6738824
## Test set 26.621870 33.88176 28.56094 11.3497551 12.38117 0.7020879
## ACF1 Theil's U
## Training set -0.1616501 NA
## Test set -0.4716847 0.9208597
The paperback training set has an RMSE of 34.07320, versus 33.88176 for the test data. That the RMSE for the test set is lower than the training set error at least shows that we are not overfitting.
accuracy(fc.hc.ses,books[,2])
## ME RMSE MAE MPE MAPE MASE
## Training set 7.017366 30.44212 25.53629 1.814926 13.34629 0.8111909
## Test set 24.843828 40.61285 32.29691 8.697859 12.40586 1.0259503
## ACF1 Theil's U
## Training set -0.09173184 NA
## Test set -0.71082349 0.7786365
The hardcover training set RMSE (30.44212) is lower than paperbacks’. The test set RMSE is 40.61285, about 33% higher.
a. Now apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.
fc.pb.holt <- holt(pb, h=4)
autoplot(fc.pb.holt) +
forecast::autolayer(fitted(fc.pb.holt), series="Fitted") +
ylab("Daily Paperback Sales") + xlab("Day of Month")
fc.hc.holt <- holt(hc, h=4)
autoplot(fc.hc.holt) +
forecast::autolayer(fitted(fc.hc.holt), series="Fitted") +
ylab("Daily Hardcover Sales") + xlab("Day of Month")
The predicted values are given by the forward-looking means, listed here:
fc.pb.holt$mean
## Time Series:
## Start = 27
## End = 30
## Frequency = 1
## [1] 199.6435 200.7045 201.7654 202.8263
fc.hc.holt$mean
## Time Series:
## Start = 27
## End = 30
## Frequency = 1
## [1] 237.4084 240.7282 244.0481 247.3679
b. Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.
accuracy(fc.pb.holt,books[,1])
## ME RMSE MAE MPE MAPE MASE
## Training set -3.875031 31.94914 26.36988 -5.838267 16.06684 0.6482272
## Test set 17.265080 26.95439 24.14777 7.035506 10.69651 0.5936030
## ACF1 Theil's U
## Training set -0.1460121 NA
## Test set -0.4725323 0.7436385
RMSE improves from 33.88176 under simple exponential smoothing to 26.95439 under Holt’s linear method when performing 4-period forcasting for the test set. The training set error also improves from 34.07320 to 31.94914, as measured by RMSE.
accuracy(fc.hc.holt,books[,2])
## ME RMSE MAE MPE MAPE MASE
## Training set -0.1887632 26.57365 22.29866 -2.205423 12.17368 0.7083436
## Test set -1.6381643 30.92770 28.59009 -2.404104 12.11819 0.9081984
## ACF1 Theil's U
## Training set 0.04855399 NA
## Test set -0.73601579 0.468241
By capturing the trend in the hardcover data, we see sizeable improvements in both training set (26.57365 vs. 30.44212) and test set (30.92770 vs. 40.61285) error, as measured by RMSE.
c. Compare the forecasts for the two series using both methods. Which do you think is best?
fc.pb.ses$mean
## Time Series:
## Start = 27
## End = 30
## Frequency = 1
## [1] 191.8781 191.8781 191.8781 191.8781
tail(books[,1],4)
## Time Series:
## Start = 27
## End = 30
## Frequency = 1
## [1] 222 217 188 247
ses: 191.8781 191.8781 191.8781 191.8781 holt: 199.6435 200.7045 201.7654 202.8263
Both models underestimate three of the paperbook values, although holt() less so by capturing some of the upward trend (slope is 1.0737).
fc.hc.ses$mean
## Time Series:
## Start = 27
## End = 30
## Frequency = 1
## [1] 215.9062 215.9062 215.9062 215.9062
tail(books[,2],4)
## Time Series:
## Start = 27
## End = 30
## Frequency = 1
## [1] 201 283 220 259
ses: 215.9062 215.9062 215.9062 215.9062 holt: 237.4084 240.7282 244.0481 247.3679
The hardcover data shows wide swings at the end of the time series, which are not predicted by either of the models. ses() does a better job predicting the below-mean values, while holt() does better with the above-mean values.
The holt() model appears to be the better choice from this cursory examination of the last 4 in-sample periods.
d. Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.
fc.pb.ses[c('upper','lower')]
## $upper
## Time Series:
## Start = 27
## End = 30
## Frequency = 1
## 80% 95%
## 27 237.3277 261.3873
## 28 237.7406 262.0188
## 29 238.1498 262.6446
## 30 238.5555 263.2650
##
## $lower
## Time Series:
## Start = 27
## End = 30
## Frequency = 1
## 80% 95%
## 27 146.4285 122.3690
## 28 146.0156 121.7375
## 29 145.6064 121.1116
## 30 145.2008 120.4913
fc.pb.holt[c('upper','lower')]
## $upper
## Time Series:
## Start = 27
## End = 30
## Frequency = 1
## 80% 95%
## 27 244.1549 267.7177
## 28 245.2158 268.7787
## 29 246.2767 269.8396
## 30 247.3376 270.9005
##
## $lower
## Time Series:
## Start = 27
## End = 30
## Frequency = 1
## 80% 95%
## 27 155.1322 131.5693
## 28 156.1931 132.6303
## 29 157.2540 133.6912
## 30 158.3150 134.7521
m.pb <- mean(window(books[,1], end=length(books[,1])-4))
s.pb.ses <- sd(fc.pb.ses$model$residuals)
s.pb.holt <- sd(fc.pb.holt$model$residuals)
print(paste0("The mean of the paperback sales values calculated from the data set is: ", round(m.pb,3)))
## [1] "The mean of the paperback sales values calculated from the data set is: 181.462"
print(paste0("The standard deviation of the residuals calculated from residuals from ses() model is: ", round(s.pb.ses,3)))
## [1] "The standard deviation of the residuals calculated from residuals from ses() model is: 34.251"
print(paste0("The standard deviation of the residuals calculated from residuals from holt() model is: ", round(s.pb.holt,3)))
## [1] "The standard deviation of the residuals calculated from residuals from holt() model is: 32.341"
m.pb - 1.96 * s.pb.ses
## [1] 114.3291
m.pb + 1.96 * s.pb.ses
## [1] 248.594
m.pb - 1.96 * s.pb.holt
## [1] 118.0726
m.pb + 1.96 * s.pb.holt
## [1] 244.8505
Paperback series 95% confidence intervals
ses: [122.3690, 261.3873]
holt: [131.5693, 267.7177]
manual_ses: [114.3291, 248.594]
manual_holt: [118.0726, 244.8505]
fc.hc.ses[c('upper','lower')]
## $upper
## Time Series:
## Start = 27
## End = 30
## Frequency = 1
## 80% 95%
## 27 256.5123 278.0079
## 28 259.3115 282.2888
## 29 261.9407 286.3099
## 30 264.4277 290.1134
##
## $lower
## Time Series:
## Start = 27
## End = 30
## Frequency = 1
## 80% 95%
## 27 175.3000 153.8044
## 28 172.5009 149.5235
## 29 169.8716 145.5024
## 30 167.3847 141.6989
fc.hc.holt[c('upper','lower')]
## $upper
## Time Series:
## Start = 27
## End = 30
## Frequency = 1
## 80% 95%
## 27 274.4307 294.0290
## 28 277.7505 297.3489
## 29 281.0703 300.6687
## 30 284.3901 303.9885
##
## $lower
## Time Series:
## Start = 27
## End = 30
## Frequency = 1
## 80% 95%
## 27 200.3862 180.7878
## 28 203.7060 184.1076
## 29 207.0258 187.4275
## 30 210.3457 190.7473
m.hc <- mean(window(books[,2], end=length(books[,1])-4))
s.hc.ses <- sd(fc.hc.ses$model$residuals)
s.hc.holt <- sd(fc.hc.holt$model$residuals)
print(paste0("The mean of the hardcover sales values calculated from the data set is: ", round(m.hc,3)))
## [1] "The mean of the hardcover sales values calculated from the data set is: 192.385"
print(paste0("The standard deviation of the residuals calculated from residuals from ses() model is: ", round(s.hc.ses,3)))
## [1] "The standard deviation of the residuals calculated from residuals from ses() model is: 30.209"
print(paste0("The standard deviation of the residuals calculated from residuals from holt() model is: ", round(s.hc.holt,3)))
## [1] "The standard deviation of the residuals calculated from residuals from holt() model is: 27.099"
m.hc - 1.96 * s.hc.ses
## [1] 133.1751
m.hc + 1.96 * s.hc.ses
## [1] 251.5941
m.hc - 1.96 * s.hc.holt
## [1] 139.2701
m.hc + 1.96 * s.hc.holt
## [1] 245.4991
Hardcover series 95% confidence intervals
ses: [153.8044, 278.0079]
holt: [180.7878, 294.0290]
manual_ses: [133.1751, 251.5941]
manual_holt: [139.2701, 245.4991]
Recall your retail time series data (from Exercise 3 in Section 2.10).
retaildata <- readxl::read_excel('retail.xlsx', skip=1)
myts <- ts(retaildata[,"A3349824F"], frequency=12, start=c(1982,4))
#tail(myts) #ends Dec 2013
lambda <- BoxCox.lambda(myts)
dframe <- cbind(RawSeries = myts, TransformedData = BoxCox(myts,lambda))
autoplot(dframe, facet=TRUE) +
xlab("Years") + ylab("Sales") +
ggtitle("Australian Retail Data (ID: A3349824F), \n power parameter lambda = ", round(lambda,5))
a. Why is multiplicative seasonality necessary for this series?
When we apply the ets() function, we observe it selects the ETS(M,Ad,M) model as the best fit. That indicated multiplicative error, damped additive trend with multiplicative seasonality.
ets(myts)
## ETS(M,Ad,M)
##
## Call:
## ets(y = myts)
##
## Smoothing parameters:
## alpha = 0.4886
## beta = 0.0155
## gamma = 0.1357
## phi = 0.98
##
## Initial states:
## l = 54.5757
## b = 0.7204
## s = 0.9373 0.8911 0.9461 1.4023 1.0547 1.0183
## 0.958 0.9831 0.9632 0.9424 1.0189 0.8847
##
## sigma: 0.0496
##
## AIC AICc BIC
## 4022.269 4024.159 4093.240
This is further supported by this plot, which shows an increase in volatility when the values are seasonally-differenced by a lag of 12.
autoplot(diff(myts,12)) +
xlab("Year") + ylab("") +
ggtitle("Seasonally-differenced Retail Data")
b. Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.
myts.train <- window(myts, end=c(2005,8))
myts.test <- window(myts, start=c(2005,9))
#length(myts.test)
fc.myts.hw <- hw(myts.train, seasonal="multiplicative", h=100)
fc.myts.damped <- hw(myts.train, seasonal="multiplicative", damped=TRUE, h=100)
autoplot(myts, series='Raw data') +
autolayer(fc.myts.hw, series="HW multiplicative seasonality", PI=FALSE) +
autolayer(fc.myts.damped, series="HW dampened trend", PI=FALSE)
fc.myts.damped$model
## Damped Holt-Winters' multiplicative method
##
## Call:
## hw(y = myts.train, h = 100, seasonal = "multiplicative", damped = TRUE)
##
## Smoothing parameters:
## alpha = 0.4766
## beta = 0.0169
## gamma = 0.1427
## phi = 0.98
##
## Initial states:
## l = 54.2878
## b = 0.5522
## s = 0.9401 0.8981 1.0211 1.3581 1.0453 1.0216
## 0.9537 0.9578 0.9487 0.9822 0.9888 0.8844
##
## sigma: 0.0564
##
## AIC AICc BIC
## 2803.867 2806.478 2869.358
When we use a dampened model, the dampening parameter is determined to be phi = 0.98.
c. Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?
print("Holt-Winters Model Errors with Multiplicative Seasonality")
## [1] "Holt-Winters Model Errors with Multiplicative Seasonality"
accuracy(fc.myts.hw, myts)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.812558 9.403028 6.89913 0.3976988 4.191641 0.4147558
## Test set -26.078074 43.684165 36.75501 -6.4274086 8.886152 2.2096050
## ACF1 Theil's U
## Training set -0.08687436 NA
## Test set 0.89762015 0.9498884
print("")
## [1] ""
print("Holt-Winters Model Errors with Multiplicative Seasonality and Dampened Trend")
## [1] "Holt-Winters Model Errors with Multiplicative Seasonality and Dampened Trend"
accuracy(fc.myts.damped, myts)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.9000269 9.425183 6.947893 0.3603983 4.217404 0.4176872
## Test set 2.5815068 35.540305 30.946366 -0.1863315 7.152830 1.8604059
## ACF1 Theil's U
## Training set -0.09315505 NA
## Test set 0.89976891 0.7207306
The training set RMSE is slightly less for the Holt Winters model with multiplicative seasonality (9.403028 vs. 9.425183), while the model with dampened trend holds up better when looking at the test set. There is a large jump in error under both models, suggesting weak fitting out of sample.
d. Check that the residuals from the best method look like white noise.
checkresiduals(fc.myts.hw)
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' multiplicative method
## Q* = 37.559, df = 8, p-value = 9.08e-06
##
## Model df: 16. Total lags used: 24
checkresiduals(fc.myts.damped)
##
## Ljung-Box test
##
## data: Residuals from Damped Holt-Winters' multiplicative method
## Q* = 40.104, df = 7, p-value = 1.202e-06
##
## Model df: 17. Total lags used: 24
The low p-values and high Q* for the Ljung-Box test leads us to reject the hypothesis that the data are not significantly different from white noise.
e. Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 8 in Section 3.7?
myts2.train <- window(myts, end=c(2010,12))
#length(myts2.train)
myts2.test <- window(myts, start=c(2011,1))
fc.myts.snaive <- snaive(myts2.train, h=length(myts2.test))
accuracy(fc.myts.snaive, myts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 12.11051 25.24173 18.71471 6.299482 8.728705 1.000000
## Test set 83.08611 90.62467 83.08611 17.030562 17.030562 4.439614
## ACF1 Theil's U
## Training set 0.7639057 NA
## Test set 0.7944137 2.03766
fc.myts.hw2 <- hw(myts2.train, seasonal="multiplicative", h=length(myts2.test))
accuracy(fc.myts.hw2, myts2.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.3443698 10.02954 7.474705 -0.2325084 4.031062 0.3994026
## Test set 54.5217782 61.42603 54.799584 11.0025099 11.073560 2.9281549
## ACF1 Theil's U
## Training set 0.03814269 NA
## Test set 0.80369891 1.357665
autoplot(myts) +
autolayer(fc.myts.snaive, series="Seasonal Naive", PI=FALSE) +
autolayer(fc.myts.hw2, series="Holt-Winters with Multiplicative Seasonality", PI=FALSE)
The Holt-Winters model beats the Seasonally-naive one in terms of RMSE of the training set (10.02954 vs 25.24173), as well as the test set (61.42603 vs. 90.62467). However there is a large jump in the RMSE for the test set for both, suggesting poor fit of the models.
For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?
lambda <- BoxCox.lambda(myts2.train)
myts.bc <- BoxCox(myts2.train,lambda)
#length(myts.bc)
Applying the ets() function, as expected, we now have no seasonality (thanks to seasonal adjustment), and the error and trends are both additive (thanks to the box-cox transformation).
myts.bc %>% mstl() -> fit #automatic STL model
(fc.myts.bc.ssn <- ets(seasadj(fit)))
## ETS(A,A,N)
##
## Call:
## ets(y = seasadj(fit))
##
## Smoothing parameters:
## alpha = 0.511
## beta = 1e-04
##
## Initial states:
## l = 6.0593
## b = 0.015
##
## sigma: 0.1146
##
## AIC AICc BIC
## 526.9949 527.1719 546.2127
To evaluate the forecasts, we have to back-transform this data, reversing the Box-Cox transformation, before subjecting it to the Holt model.
myts.backtrack <- (lambda * fc.myts.bc.ssn$x + 1) ^ (1/lambda)
fc.myts.bt.holt <- holt(myts.backtrack, h=length(myts2.test))
accuracy(fc.myts.bt.holt, myts2.test)
## ME RMSE MAE MPE MAPE
## Training set 0.002605517 8.973722 6.59631 -0.3164603 3.362551
## Test set 53.437754918 79.672319 60.12208 9.8023328 11.569292
## MASE ACF1 Theil's U
## Training set 0.3510053 -0.03134838 NA
## Test set 3.1992386 0.55913689 1.683323
The training set RMSE for the Box-Cox transformed, seasonally-adjusted model is 8.973722, better than the previous two models. However, the test set error (79.672319) is higher than the Holt-Winter Seasonally adjusted model (although better than the seasonally naive one)