Textbook: Hyndman, R.J., & Athanasopoulos, G. (2018) Forecasting: principles and practice, 2nd edition, OTexts: Melbourne, Australia.
#Load packages
library(tidyverse)
library(fpp2)
library(kableExtra)
Consider the pigs series — the number of pigs slaughtered in Victoria each month.
a) Use the ses() function in R to find the optimal values of \(\alpha\) and \(\ell_0\), and generate forecasts for the next four months.
ses_pigs <- ses(pigs, 4)
summary(ses_pigs)
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = pigs, h = 4)
##
## Smoothing parameters:
## alpha = 0.2971
##
## Initial states:
## l = 77260.0561
##
## sigma: 10308.58
##
## AIC AICc BIC
## 4462.955 4463.086 4472.665
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 385.8721 10253.6 7961.383 -0.922652 9.274016 0.7966249 0.01282239
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Sep 1995 98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995 98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995 98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995 98816.41 83958.37 113674.4 76092.99 121539.8
The optimal values for \(\alpha = 0.2971\) and smoothed value \(\ell_0 = 77260.0561\). We will use these values to generate forecasts for the next four months.
autoplot(ses_pigs) +
autolayer(fitted(ses_pigs), series = "Fitted") +
ylab("Number Of Pigs Slaughtered") +
theme_minimal()
The chart shows the intervals for the number of pigs foecasted to be slaughtered over the next four months.
b) Compute a 95% prediction interval for the first forecast using \(\hat{y} \pm 1.96 \sigma\) where \(\sigma\) is the standard deviation of the residuals. Compare your interval with the interval produced by R.
#computing the intervals using R
print('The 95% prediction interval using R for is')
## [1] "The 95% prediction interval using R for is"
ses_pigs$upper[1,2]
## 95%
## 119020.8
ses_pigs$lower[1,2]
## 95%
## 78611.97
#computing the intervals using formula
s <- sd(residuals(ses_pigs))
print('The 95% confidence interval using formula is')
## [1] "The 95% confidence interval using formula is"
ses_pigs$mean[1] + 1.96*s
## [1] 118952.8
ses_pigs$mean[1] - 1.96*s
## [1] 78679.97
There is not a lot of difference between the results calculated using the two methods.
Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.
a) Plot the series and discuss the main features of the data.
summary(books)
## Paperback Hardcover
## Min. :111.0 Min. :128.0
## 1st Qu.:167.2 1st Qu.:170.5
## Median :189.0 Median :200.5
## Mean :186.4 Mean :198.8
## 3rd Qu.:207.2 3rd Qu.:222.0
## Max. :247.0 Max. :283.0
autoplot(books) +
labs(title = "Daily Sales of Paperback and Hardcover Books",
x = "Number of Day",
y = "Price, in US dollars") +
scale_y_continuous() +
theme_minimal()
The dataset contains 30 days of sales for paerpback and hardcover books. The sales of both categories show an increasing trend. There is a lot of fluctuation but the data does not exhibit season or cyclic patterns.
b) Use the ses() function to forecast each series, and plot the forecasts.
#Forecast each series, and plot the forecasts.
sespaperback <- ses(books[, "Paperback"], h = 4)
seshardcover <- ses(books[, "Hardcover"], h = 4)
#Plot
autoplot(books[, "Paperback"], series = "Paperback") +
autolayer(sespaperback, series = "Paperback") +
autolayer(books[, "Hardcover"], series = "Hardcover") +
autolayer(seshardcover, series = "Hardcover", PI = FALSE) +
ylab("Sales amount") +
ggtitle("Sales of paperback and hardcover books")+
theme_minimal()
Both forecasts using ses are flat, the upward trend has not been captured.
c) Compute the RMSE values for the training data in each case.
print('RMSE for paperback')
## [1] "RMSE for paperback"
sqrt(mean(sespaperback$residuals^2))
## [1] 33.63769
print('RMSE for hardcover')
## [1] "RMSE for hardcover"
sqrt(mean(seshardcover$residuals^2))
## [1] 31.93101
RMSE for hardcover is lower than the RMSE for paperback.
We will continue with the daily sales of paperback and hardcover books in data set books.
a) Apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.
#apply holt
holt_paperback <- holt(books[, "Paperback"], h = 4)
holt_hardcover <- holt(books[, "Hardcover"], h = 4)
#plot
autoplot(books) +
autolayer(holt_paperback, series="Paperback", PI=FALSE)+
autolayer(holt_hardcover, series="Hardcover", PI=FALSE)+
ggtitle("Forecasts of sales of paperback and hardcover books")+
theme_minimal()
The Holt method was able to capture the upward trend in the data.
b) Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.
#RMSE for both series
print('RMSE for paperback')
## [1] "RMSE for paperback"
accuracy(holt_paperback)
## ME RMSE MAE MPE MAPE MASE
## Training set -3.717178 31.13692 26.18083 -5.508526 15.58354 0.6602122
## ACF1
## Training set -0.1750792
print('RMSE for hardcover')
## [1] "RMSE for hardcover"
accuracy(holt_hardcover)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.1357882 27.19358 23.15557 -2.114792 12.1626 0.6908555
## ACF1
## Training set -0.03245186
The RMSE for both series decreased compared to ses method. Holt’s method fits better and allows for an improvement over the ses method by capturing the upward trend.
c) Compare the forecasts for the two series using both methods. Which do you think is best?
#Paperback
autoplot(books[,1])+
autolayer(sespaperback, series="Simple Exponential Smoothing", PI=FALSE)+
autolayer(holt_paperback, series= 'Holt Method', PI=FALSE) +
labs(title="Paperback' Sales: Exponential Smoothing Vs Holt Method")+
theme_minimal()
#Hardcover
autoplot(books[,1])+
autolayer(seshardcover, series="Simple Exponential Smoothing", PI=FALSE)+
autolayer(holt_hardcover, series= 'Holt Method', PI=FALSE) +
labs(title="Hardcover' Sales: Exponential Smoothing Vs Holt Method")+
theme_minimal()
The ses method predicts constant values. Holt’s method in both the series captures the trend and seems to fit the data better. The lower RMSE suggests the observed data points are closer the the predicted ones when using the Holt method.
d) Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.
Using RMSE
data.frame(PI = c("Paperback - SES", "Hardcover - SES", "Paperback - Holt",
"Hardcover - Holt"),
Lower = c(sespaperback$mean[1] - 1.96*accuracy(sespaperback)[2],
seshardcover$mean[1] - 1.96*accuracy(seshardcover)[2],
holt_paperback$mean[1] - 1.96*accuracy(holt_hardcover)[2],
holt_hardcover$mean[1] - 1.96*accuracy(holt_paperback)[2]),
Upper = c(sespaperback$mean[1] + 1.96*accuracy(sespaperback)[2],
seshardcover$mean[1] + 1.96*accuracy(seshardcover)[2],
holt_paperback$mean[1] + 1.96*accuracy(holt_paperback)[2],
holt_hardcover$mean[1] + 1.96*accuracy(holt_hardcover)[2]))%>%
kbl(caption = '95% Prediction Interval by RMSE') %>%
kable_styling(bootstrap_options = c('striped', 'hover'))
| PI | Lower | Upper |
|---|---|---|
| Paperback - SES | 141.1798 | 273.0395 |
| Hardcover - SES | 176.9753 | 302.1449 |
| Paperback - Holt | 156.1674 | 270.4951 |
| Hardcover - Holt | 189.1455 | 303.4733 |
Using R functions
data.frame(PI = c("Paperback - SES", "Hardcover - SES", "Paperback - Holt",
"Hardcover - Holt"),
Lower = c(sespaperback$lower[1, '95%'],
seshardcover$lower[1, '95%'],
holt_paperback$lower[1, '95%'],
holt_hardcover$lower[1, '95%']),
Upper = c(sespaperback$upper[1,'95%'],
seshardcover$upper[1,'95%'],
holt_paperback$upper[1,'95%'],
holt_hardcover$upper[1,'95%']))%>%
kbl(caption = '95% Prediction Interval by R') %>%
kable_styling(bootstrap_options = c('striped', 'hover'))
| PI | Lower | Upper |
|---|---|---|
| Paperback - SES | 138.8670 | 275.3523 |
| Hardcover - SES | 174.7799 | 304.3403 |
| Paperback - Holt | 143.9130 | 275.0205 |
| Hardcover - Holt | 192.9222 | 307.4256 |
The intervals using both methods are very close.
For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900–1993. Experiment with the various options in the holt() function to see how much the forecasts change with the damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.
[Hint: use h = 100 when calling holt() so you can clearly see the differences between the various options when plotting the forecasts.]
Which model gives the best RMSE?
#Functions
main <- holt(eggs, h=100)
damped <- holt(eggs, damped=TRUE, h=100)
exponential <- holt(eggs, exponential = TRUE, h=100)
lambda <- holt(eggs, lambda='auto', damped=TRUE, h=100)
exp_damped <- holt(eggs, exponential = TRUE, damped = TRUE, h=100)
boxcox_damped <- holt(eggs, lambda='auto', damped=TRUE, biasadj = TRUE, h=100)
# Creating plot
autoplot(eggs)+
autolayer(main, series='Basic', PI=FALSE)+
autolayer(damped, series='damped', PI=FALSE)+
autolayer(exponential, series='exponential', PI=FALSE)+
autolayer(lambda, series='lambda', PI=FALSE)+
autolayer(exp_damped, series='damped, exponential', PI=FALSE)+
autolayer(boxcox_damped, series='damped, boxcox transformed', PI=FALSE)+
labs(title='Forecasts using different Methods')+
scale_y_continuous()+
theme_minimal()
The different methods have forecasted different prices from negative to over 100. Let us look at the accuracy statistics of the different models.
accuracy_table <- rbind(accuracy(main), accuracy(damped), accuracy(exponential),
accuracy(lambda), accuracy(exp_damped), accuracy(boxcox_damped))
rownames(accuracy_table) <- c('Basic', 'Damped', 'Exponential', 'Lambda', 'Damped, Exponential', 'Damped, Box-Cox Transformed')
accuracy_table %>%
kable(caption='Accuracy measures using different methods') %>%
kable_styling(bootstrap_options = c('striped', 'hover'))
| ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | |
|---|---|---|---|---|---|---|---|
| Basic | 0.0449909 | 26.58219 | 19.18491 | -1.142201 | 9.653791 | 0.9463626 | 0.0134820 |
| Damped | -2.8914955 | 26.54019 | 19.27950 | -2.907633 | 10.018944 | 0.9510287 | -0.0031954 |
| Exponential | 0.4918791 | 26.49795 | 19.29399 | -1.263235 | 9.766049 | 0.9517436 | 0.0103908 |
| Lambda | -0.8200445 | 26.53321 | 19.45654 | -2.019718 | 9.976131 | 0.9597618 | 0.0058524 |
| Damped, Exponential | -0.9089678 | 26.59113 | 19.54973 | -2.125756 | 10.023283 | 0.9643590 | 0.0137612 |
| Damped, Box-Cox Transformed | -1.8062134 | 26.58589 | 19.55896 | -2.584250 | 10.117605 | 0.9648141 | 0.0053221 |
It looks like the exponential method has the lowest RMSe. The method has captured the downward trend and does not give us a negative price.
Recall your retail time series data (from Exercise 3 in Section 2.10).
retaildata = readxl::read_excel("retail.xlsx", skip = 1)
myts = ts(retaildata[, 15], frequency = 12, start = c(1982,4))
a) Why is multiplicative seasonality necessary for this series?
autoplot(myts) +
labs(title = "New South Wales - Other Recreational Goods Retailing",
x = "Year", y = "Sales")+
theme_minimal()
Seasonal variations are not constant and move up and down as we move across time. Multiplicative seasonality is therefore preferred.
b) Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.
model1 = hw(myts, seasonal = 'multiplicative', h = 100)
model2 = hw(myts, seasonal = 'multiplicative', damped = TRUE, h = 100)
autoplot(model1) +
labs(title = 'Multiplicative ', x = "Year", y = "Sales") +
theme_minimal()
autoplot(model2) +
labs(title = 'Multiplicative and Damped', x = "Year", y = "Sales") +
theme_minimal()
When damp is applied, the trend increases slowly and more steadily compared to the model wihtout damping. Prediction intervals seem more evenly spread with damping and the larger increasing trend and seasonality is exhibited in the intervals.
c) Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?
data.frame(Method = c('Multiplicative', 'Multiplicative and Damped'),
RMSE = c(accuracy(model1)[2],
accuracy(model2)[2])) %>%
kable(caption='RMSE') %>%
kable_styling(bootstrap_options = c('striped', 'hover', 'condensed'), full_width = FALSE)
| Method | RMSE |
|---|---|
| Multiplicative | 4.512297 |
| Multiplicative and Damped | 4.561897 |
Holt-Winters’ multiplicative method without a damping factor seems to be a better fit with a lower RMSE. The difference is quite small though.
d) Check that the residuals from the best method look like white noise.
checkresiduals(model1)
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' multiplicative method
## Q* = 72.086, df = 8, p-value = 1.887e-12
##
## Model df: 16. Total lags used: 24
The histogram shows nearly normal residuals with a slight positive skew. Based the on the Ljung-Box test, the residuals are different from a white noise series. The Acf plot shows 7 out of 36 (20%) spikes beyond the accepted intervals, indicating sutocorrelation in the data.
e) Now find the test set RMSE while training the model to the end of 2010. Can you beat the seasonal naive approach from Exercise 8 in Section 3.7??
We will split the data and fit the training set with 3 methods which are seasonal naive, holt-winter’s multiplicative and holt-winter’s additive trend with boxcox transformation.
#Training and test datasets
training <- window(myts, end=c(2010,12))
test <- window(myts, start=c(2011,1))
#Plot
autoplot(myts)+
autolayer(training, series='Training')+
autolayer(test, series='Test')+
labs(title = 'Train and Test set', x = 'Year', y = 'Sales')+
theme_minimal()
#Fitting the models
snaive <- snaive(training, h=36)
multiplicative <- hw(training, h=36, seasonal='multiplicative', damped=F)
additive <- hw(training, h=36, seasonal='additive', damped=F, lambda='auto')
mult_damp = hw(training, seasonal = "multiplicative", damped = TRUE, h = 36)
autoplot(test, series='Basic')+
autolayer(snaive, series='Seasonal Naive',PI=FALSE, h=36)+
autolayer(multiplicative, series="Holt-Winters' Multipicative", PI=FALSE, h=36)+
autolayer(mult_damp, series="Holt-Winters' Multipicative Damped", PI=FALSE, h=36)+
autolayer(additive, series="Holt-Winters' Additive", PI=FALSE, h=36)+
labs(title = 'Comparison of models', x = 'Year', y = 'Sales') +
theme_minimal()
Let us check the RMSE values for the four models.
df = data.frame(RMSE = cbind(accuracy(snaive, test)[,2],
accuracy(additive, test)[,2],
accuracy(multiplicative, test)[,2],
accuracy(mult_damp, test)[,2]))
names(df) = c('Seasonal Naive', 'Additive', 'Multiplicative', 'Multiplicative damped')
df%>%
kable(caption='RMSE for different models') %>%
kable_styling(bootstrap_options = c('striped', 'hover', 'condensed'), full_width = FALSE)
| Seasonal Naive | Additive | Multiplicative | Multiplicative damped | |
|---|---|---|---|---|
| Training set | 6.498441 | 3.734103 | 3.744601 | 3.788172 |
| Test set | 14.858069 | 15.102439 | 10.870610 | 12.332173 |
The conclusion is same as before. With the lowest RMSE, the Holt-Winters’ multiplicative method without a damping factor does fit this time series best.
For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?
lambda <- BoxCox.lambda(training)
model_stl <- stlf(training, lambda=lambda)
model_ets <- ets(seasadj(decompose(training, 'multiplicative')))
# Plot
autoplot(training, series='train') +
autolayer(forecast(model_stl, h = 24, PI=F), series = 'STL Forecast') +
autolayer(forecast(model_ets, h = 24, PI=F), series = 'ETS Forcast') +
autolayer(test, series = 'test')+
theme_minimal()+
labs(title = 'Forecasts using different models', x = 'Year', y = 'Sales')
accuracy(model_stl)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.029466 2.993223 2.212575 -0.107248 3.677062 0.4517952
## ACF1
## Training set 0.08742471
accuracy(model_ets)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.4551271 3.612001 2.708557 0.6385719 4.446686 0.5497103 0.1207611
In both cases, we see a much lower RMSE than the previous models.