Data 624 HW5: Exponential Smoothing
library(tidyverse)
library(fpp2)
library(rio)
library(gridExtra)
library(scales)
library(naniar)
library(missForest)1 HW5: Exponential Smoothing
Do exercises 7.1, 7.5,7.6, 7.7, 7.8 and 7.9 in Hyndman. Please submit both the link to your Rpubs and the .rmd file.
1.1 Ex. 7.1
Consider the pigs series — the number of pigs slaughtered in Victoria each month.
(a.) Use the ses() function in \(R\) to find the optimal values of \(\alpha\) and \(l_{0}\), and generate forecasts for the next four months.
(b.) Compute a 95% prediction interval for the first forecast using \(\hat{y}\pm 1.96s\) where \(s\) is the standard deviation of the residuals. Compare your interval with the interval produced by \(R\).
1.1.1 Part a
Use the ses() function in \(R\) to find the optimal values of \(\alpha\) and \(l_{0}\), and generate forecasts for the next four months.
Answer:
From the
sesfunction below, we got \(\alpha = 0.2971\) and \(l_{0} = 77260.0561\).The forecasts for the next four months are showed below in table form and in graph form.
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = pigs, h = 4)
##
## Smoothing parameters:
## alpha = 0.2971
##
## Initial states:
## l = 77260.0561
##
## sigma: 10308.58
##
## AIC AICc BIC
## 4462.955 4463.086 4472.665
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 385.8721 10253.6 7961.383 -0.922652 9.274016 0.7966249
## ACF1
## Training set 0.01282239
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Sep 1995 98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995 98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995 98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995 98816.41 83958.37 113674.4 76092.99 121539.8
autoplot(pigs_ses) +
autolayer(fitted(pigs_ses), series="Fitted") +
xlab("Year") + ylab("Number of Pigs Slaughtered in Victoria")1.1.2 Part b
Compute a 95% prediction interval for the first forecast using \(\hat{y}\pm 1.96s\) where \(s\) is the standard deviation of the residuals. Compare your interval with the interval produced by \(R\).
Answer:
From the \(R\) summary table below, the 95% prediction interval for the first forecast is \([78611.97, 119020.8]\).
By calculation using \(\hat{y}\pm 1.96s\) , the 95% prediction interval for the first forecast is \([78679.97, 118952.84]\).
The interval computed by the formula has a smaller range than the one produced by \(R\).
## 95% 95%
## 78611.97 119020.84
## [1] 78679.97 118952.84
1.2 Ex. 7.5
Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.
(a.) Plot the series and discuss the main features of the data.
(b.) Use the ses() function to forecast each series, and plot the forecasts.
(c.) Compute the RMSE values for the training data in each case.
1.2.1 Part a
Plot the series and discuss the main features of the data.
Answer:
Over the 30 days, there is an increasing trend on both paperback and hardcover books.
Due to the dramatic fluctuation in the daily sales, it is hard to see any seasonality or cyclic factors for both type of books.
#books
autoplot(books) +
xlab("Day") + ylab("Number of Books") +
ggtitle("Daily Sales of Paperback and Hardcover Books at the same store")1.2.2 Part b
Use the ses() function to forecast each series, and plot the forecasts.
Answer:
- The forecasts for both paperback and hardcover books using SES method are flat.
pb_ses <- ses(books[, "Paperback"], h=4)
#summary(pb_ses)
hc_ses <- ses(books[, "Hardcover"], h=4)
#summary(hc_ses)
autoplot(books) +
autolayer(pb_ses, series="Paperback", alpha=0.6) +
autolayer(hc_ses, series="Hardcover", alpha=0.4) +
xlab("Day") + ylab("Number of Books") +
ggtitle("Daily Sales of Paperback and Hardcover Books at the same store")1.2.3 Part c
Compute the RMSE values for the training data in each case.
Answer:
The RMSE for the
sesPaperback books training data is 33.6377.The RMSE for the
sesHardcover books training data is 31.9310.
## [1] "The RMSE for the `ses` Paperback books training data is 33.637686782912"
## [1] "The RMSE for the `ses` Hardcover books training data is 31.9310149844547"
1.3 Ex. 7.6
(a.) Now apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.
(b.) Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.
(c.) Compare the forecasts for the two series using both methods. Which do you think is best?
(d.) Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.
1.3.1 Part a
Now apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.
Answer:
- There are increasing trend in both of the forecasts using Holt’s linear method.
pb_holt <- holt(books[, "Paperback"], h=4)
#summary(pb_holt)
hc_holt <- holt(books[, "Hardcover"], h=4)
#summary(hc_holt)
autoplot(books) +
autolayer(pb_holt, series="Paperback", alpha=0.6) +
autolayer(hc_holt, series="Hardcover", alpha=0.4) +
xlab("Day") + ylab("Number of Books") +
ggtitle("Daily Sales of Paperback and Hardcover Books at the same store")1.3.2 Part b
Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.
Answer:
The RMSE measures of Holt’s method for the two series are smaller than those of simple exponential smoothing method, which are better.
Holt’s linear method using one more parameter than SES, \(b_{t}\), an estimate of the trend of the series at time \(t\). This allows us to forecast data with trend, here in this data as an increasing trend.
Therefore, Holt’s linear method performs better than simple exponential smoothing method.
print(paste("The RMSE for the SES Paperback and Hardcover training data is",
accuracy(pb_ses)[2],"and",accuracy(hc_ses)[2]))## [1] "The RMSE for the SES Paperback and Hardcover training data is 33.637686782912 and 31.9310149844547"
print(paste("The RMSE for the Holt's Paperback and Hardcover training data is",
accuracy(pb_holt)[2],"and",accuracy(hc_holt)[2]))## [1] "The RMSE for the Holt's Paperback and Hardcover training data is 31.1369230162347 and 27.1935779818511"
1.3.3 Part c
Compare the forecasts for the two series using both methods. Which do you think is best?
Answer:
From the graph below, it is clear that Holt’s linear method allows us to forecast the trend using the same dataset compared to SES. Holt’s linear method also has smaller RMSE compared to SES.
Therefore, Holt’s linear method is better than SES.
autoplot(books) +
autolayer(pb_ses, series="SES_Paperback", PI=FALSE) +
autolayer(hc_ses, series="SES_Hardcover", PI=FALSE) +
autolayer(pb_holt, series="Holt_Paperback", PI=FALSE) +
autolayer(hc_holt, series="Holt_Hardcover", PI=FALSE) +
xlab("Day") + ylab("Number of Books") +
ggtitle("Daily Sales of Paperback and Hardcover Books at the same store")1.3.4 Part d
Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.
Answer:
- By using RMSE values and assuming normal errors, we have the 95% prediction intervals:
| 95% Prediction Interval | Lower 95 | Upper 95 |
|---|---|---|
| SES-Paperback | 141.1798 | 273.0395 |
| SES-Hardcover | 176.9753 | 302.1449 |
| Holt-Paperback | 148.4384 | 270.4951 |
| Holt-Hardcover | 196.8745 | 303.4733 |
- By using \(R\), we have the 95% prediction intervals:
| 95% Prediction Interval | Lower 95 | Upper 95 |
|---|---|---|
| SES-Paperback | 138.8670 | 275.3523 |
| SES-Hardcover | 174.7799 | 304.3403 |
| Holt-Paperback | 143.9130 | 275.0205 |
| Holt-Hardcover | 192.9222 | 307.4256 |
#by calculation using RMSE and assuming normal errors
print(paste("95% PI of SES Paperback: (",pb_ses$mean[1]-1.96*accuracy(pb_ses)[2],",",
pb_ses$mean[1]+1.96*accuracy(pb_ses)[2],")"))
print(paste("95% PI of SES Hardcover: (",hc_ses$mean[1]-1.96*accuracy(hc_ses)[2],",",
hc_ses$mean[1]+1.96*accuracy(hc_ses)[2],")"))
print(paste("95% PI of Holt's Paperback: (",pb_holt$mean[1]-1.96*accuracy(pb_holt)[2],",",
pb_holt$mean[1]+1.96*accuracy(pb_holt)[2],")"))
print(paste("95% PI of Holt's Hardcover: (",hc_holt$mean[1]-1.96*accuracy(hc_holt)[2],",",
hc_holt$mean[1]+1.96*accuracy(hc_holt)[2],")"))
#by R
print(paste("95% PI of SES Paperback: (",pb_ses$lower[1, "95%"],",",pb_ses$upper[1, "95%"],")"))
print(paste("95% PI of SES Hardcover: (",hc_ses$lower[1, "95%"],",",hc_ses$upper[1, "95%"],")"))
print(paste("95% PI of Holt's Paperback: (",pb_holt$lower[1, "95%"],",",pb_holt$upper[1, "95%"],")"))
print(paste("95% PI of Holt's Hardcover: (",hc_holt$lower[1, "95%"],",",hc_holt$upper[1, "95%"],")"))## [1] "95% PI of SES Paperback: ( 141.179798900711 , 273.039531089726 )"
## [1] "95% PI of SES Hardcover: ( 176.97530263223 , 302.144881371293 )"
## [1] "95% PI of Holt's Paperback: ( 148.438396409231 , 270.495134632871 )"
## [1] "95% PI of Holt's Hardcover: ( 196.874459367927 , 303.473285056784 )"
## [1] "95% PI of SES Paperback: ( 138.867024106993 , 275.352305883445 )"
## [1] "95% PI of SES Hardcover: ( 174.779870851487 , 304.340313152036 )"
## [1] "95% PI of Holt's Paperback: ( 143.912985820256 , 275.020545221845 )"
## [1] "95% PI of Holt's Hardcover: ( 192.922170771941 , 307.42557365277 )"
1.4 Ex. 7.7
For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900–1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.
[Hint: use h=100 when calling holt() so you can clearly see the differences between the various options when plotting the forecasts.]
Which model gives the best RMSE?
1.4.1 Part a
For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900–1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.
Answer:
The
eggsdataset has an overall decreasing trend. There are no obvious seasonality or cyclic factors.Using
holt()function, I will forecast with (1) Holt’s linear method, (2) Holt’s with damped trend, (3) Holt’s with BoxCox transformation, and (4) Holt’s with Boxcox transformation plus damped trend. They are all plotted on the same forecasting graph below.
#eggs
autoplot(eggs) +
xlab("Year") + ylab("Price (Dollar)") +
ggtitle("Price of A Dozen Eggs in USA (1900-1993)")eggs_holt <- holt(eggs, h=100) #holt
eggs_holt_damped <- holt(eggs, h=100, damped=TRUE) #holt & damped
eggs_holt_boxcox <- holt(eggs, h=100, lambda="auto") #holt & boxcox
eggs_holt_boxcox_damped <- holt(eggs, h=100, lambda="auto", damped=TRUE) #holt & boxcox & damped
autoplot(eggs) +
autolayer(eggs_holt, series="Holt", PI=FALSE) +
autolayer(eggs_holt_damped, series="Holt-Damped", PI=FALSE) +
autolayer(eggs_holt_boxcox, series="Holt-Boxcox", PI=FALSE) +
autolayer(eggs_holt_boxcox_damped, series="Holt-Boxcox-Damped", PI=FALSE) +
xlab("Year") + ylab("Price (Dollar)") +
ggtitle("Price of A Dozen Eggs in USA (1900-1993)")1.4.2 Part b
Which model gives the best RMSE?
Answer:
- The RMSE of all methods are:
| Method | RMSE |
|---|---|
| Holt_boxcox | 26.3938 |
| Holt_boxcox_damped | 26.5332 |
| Holt_damped | 26.5402 |
| Holt | 26.5822 |
- The Holt’s linear method with boxcox transformation performs the best. It has the lowest RMSE 26.3938.
print(paste("RMSE of Holt's: ",accuracy(eggs_holt)[2]))
print(paste("RMSE of damped Holt's: ",accuracy(eggs_holt_damped)[2]))
print(paste("RMSE of Boxcox Holt's: ",accuracy(eggs_holt_boxcox)[2]))
print(paste("RMSE of Boxcox damped Holt's: ",accuracy(eggs_holt_boxcox_damped)[2]))## [1] "RMSE of Holt's: 26.5821881569903"
## [1] "RMSE of damped Holt's: 26.5401852798325"
## [1] "RMSE of Boxcox Holt's: 26.3937555344296"
## [1] "RMSE of Boxcox damped Holt's: 26.5332106346358"
1.5 Ex. 7.8
Recall your retail time series data (from Exercise 3 in Section 2.10).
(a.) Why is multiplicative seasonality necessary for this series?
(b.) Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.
(c.) Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?
(d.) Check that the residuals from the best method look like white noise.
(e.) Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naive approach from Exercise 8 in Section 3.7?
1.5.1 Part a
Why is multiplicative seasonality necessary for this series?
Answer:
From our textbook Sec 7.3, the multiplicative method is preferred when the seasonality variations are changing proportional to the level of the series. With the multiplicative method, the seasonal component is expressed in relative terms (percentages) and the series is seasonally adjusted by dividing through by the seasonal component.
It is necessary to use multiplicative method here because the seasonality in this time series changes gradually and increasing proportionally over time.
retail <- import("https://raw.githubusercontent.com/shirley-wong/Data-624/main/HW1/retail.xlsx",
skip=1) #this excel sheet has two header rows
myts <- ts(retail[,"A3349746K"], frequency=12, start=c(1982,4))
autoplot(myts) +
xlab("Year") + ylab("Sales") +
ggtitle("Turnover-Western Australia-Total(Industry) Time Series")1.5.2 Part b
Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.
Answer:
The multiplicative forecast has an obvious increasing trend.
By looking at the zoom-in graph, the multiplicative forecast with damped method has a nearly flattened but actually a very small increasing trend.
myts_hw <- hw(myts, seasonal="multiplicative", h=100)
myts_hw_damped <- hw(myts, seasonal="multiplicative", h=100, damped=TRUE)
autoplot(myts) +
autolayer(myts_hw, series="Multiplicative", PI=FALSE) +
autolayer(myts_hw_damped, series="Multiplicative-Damped", PI=FALSE) +
xlab("Year") + ylab("Sales") +
ggtitle("Turnover-Western Australia-Total(Industry) Time Series") +
theme(legend.position="bottom")autoplot(myts) +
autolayer(myts_hw, series="Multiplicative", PI=FALSE, alpha=0.2) +
autolayer(myts_hw_damped, series="Multiplicative-Damped", PI=FALSE) +
xlab("Year") + ylab("Sales") +
ggtitle("**Zoom-in on the forecast part") +
theme(legend.position="bottom") +
xlim(c(2013,2023)) + ylim(c(2000,4800))1.5.3 Part c
Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?
Answer:
- The RMSE of the one-step forecasts from the two methods are:
| Method | RMSE |
|---|---|
| Holt-Winter’s multiplicative | 31.6938 |
| Holt-Winter’s multiplicative damped | 31.5360 |
- I prefer the Holt-Winter’s multiplicative damped method as it has a smaller RMSE.
myts_hw_1 <- hw(myts, seasonal="multiplicative", h=1)
myts_hw_damped_1 <- hw(myts, seasonal="multiplicative", h=1, damped=TRUE)print(paste("RMSE of Holt-Winter's multiplicative method: ",accuracy(myts_hw_1)[2]))
print(paste("RMSE of Holt-Winter's multiplicative damped method: ",accuracy(myts_hw_damped_1)[2]))## [1] "RMSE of Holt-Winter's multiplicative method: 31.6937872890862"
## [1] "RMSE of Holt-Winter's multiplicative damped method: 31.5359595784103"
1.5.4 Part d
Check that the residuals from the best method look like white noise.
Answer:
Let’s study the residuals from the Holt-Winter’s multiplicative damped method as it is the better model from part c.
There are couple up peaks and low peaks in the residual time graph between 1995 and 2001.
The ACF plot shows some significant correlations between the lags of residuals.
The mean of the residuals are very close to 0 from the nearly normal histogram.
The p-value from the Ljung-Box test is nearly 0.
\(\therefore\) Although the residuals shows some correlations, they do not look like white noise
##
## Ljung-Box test
##
## data: Residuals from Damped Holt-Winters' multiplicative method
## Q* = 341.35, df = 7, p-value < 2.2e-16
##
## Model df: 17. Total lags used: 24
1.5.5 Part e
Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naive approach from Exercise 8 in Section 3.7?
Answer:
- The RMSE on the 3 methods are:
| RMSE on each method | Training Set | Test Set |
|---|---|---|
| Holt-Winter’s multiplicative | 28.76835 | 257.16472 |
| Holt-Winter’s multiplicative damped | 28.82823 | 320.82257 |
| Seasonal Naive | 83.42504 | 381.15950 |
- The best method here is the Holt-Winter’s multiplicative method with the lowest RMSE values.
myts_train <- window(myts, end=c(2010,12))
myts_test <- window(myts, start=2011)
myts_train_hw <- hw(myts_train, h=36, seasonal="multiplicative")
myts_train_hw_damped <- hw(myts_train, h=36, seasonal="multiplicative", damped=TRUE)
myts_train_snaive <- snaive(myts_train, h=36)print("RMSE of Holt-Winter's multiplicative method: ")
accuracy(myts_train_hw, myts_test)[,2]
print("RMSE of Holt-Winter's multiplicative damped method: ")
accuracy(myts_train_hw_damped, myts_test)[,2]
print("RMSE of seasonal naive method: ")
accuracy(myts_train_snaive, myts_test)[,2]## [1] "RMSE of Holt-Winter's multiplicative method: "
## Training set Test set
## 28.76835 257.16472
## [1] "RMSE of Holt-Winter's multiplicative damped method: "
## Training set Test set
## 28.82823 320.82257
## [1] "RMSE of seasonal naive method: "
## Training set Test set
## 83.42504 381.15950
autoplot(myts_train) +
autolayer(myts_train_hw, series="Multiplicative", PI=FALSE) +
autolayer(myts_train_hw_damped, series="Multiplicative-Damped", PI=FALSE) +
autolayer(myts_train_snaive, series="Seasonal Naive", PI=FALSE) +
autolayer(myts_test, series="Test Set") +
xlab("Year") + ylab("Sales") +
ggtitle("Turnover-Western Australia-Total(Industry) Time Series") +
theme(legend.position="bottom")autoplot(myts_train) +
autolayer(myts_train_hw, series="Multiplicative", PI=FALSE) +
autolayer(myts_train_hw_damped, series="Multiplicative-Damped", PI=FALSE) +
autolayer(myts_train_snaive, series="Seasonal Naive", PI=FALSE) +
autolayer(myts_test, series="Test Set") +
xlab("Year") + ylab("Sales") +
ggtitle("**Zoom-in on the forecast part") +
theme(legend.position="bottom") +
xlim(c(2010,2014)) + ylim(c(1800,3500))1.6 Ex. 7.9
For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?
1.6.1 Answer
Answer:
I use stlf() function to perform the work. In textbook Sec 6.8, it introduced the stlf() function in the last paragraphs:
“stlf() function is a short-cut approach to decompose the time series using STL, forecast the seasonally adjusted series, and return the reasonalised forecasts. The stlf() function uses mstl() to carry out the decomposition, so there are default values for s.window and t.window. If method is not specified, it will use the ETS approach applied to the seasonally adjusted series.” “Use the lambda argument if you think a Box-Cox transformation is required.”
From the graph, the STLF forecast is closer to our test set than the Holt-Winter’s multiplicative method.
The RMSE value of
stlf()function is the lowest compared to the ones we got in Ex 7.8.
| RMSE on each method | Training Set | Test Set |
|---|---|---|
| STLF | 25.37597 | 125.22173 |
| Holt-Winter’s multiplicative | 28.76835 | 257.16472 |
| Holt-Winter’s multiplicative damped | 28.82823 | 320.82257 |
| Seasonal Naive | 83.42504 | 381.15950 |
Therefore, STL decomposition applied to the Box-Cox transformed series followed by ETS on the seasonally adjust data performs the best among all methods.
myts_train_stlf <- stlf(myts_train, lambda=BoxCox.lambda(myts_train),
method="ets", allow.multiplicative.trend=TRUE, h=36)## [1] "RMSE of STLF method: "
## Training set Test set
## 25.37597 125.22173
autoplot(myts_train) +
autolayer(myts_train_hw, series="Multiplicative", PI=FALSE) +
autolayer(myts_train_stlf, series="STLF", PI=FALSE) +
autolayer(myts_test, series="Test Set") +
xlab("Year") + ylab("Sales") +
ggtitle("Turnover-Western Australia-Total(Industry) Time Series") +
theme(legend.position="bottom")autoplot(myts_train) +
autolayer(myts_train_hw, series="Multiplicative", PI=FALSE) +
autolayer(myts_train_stlf, series="STLF", PI=FALSE) +
autolayer(myts_test, series="Test Set") +
xlab("Year") + ylab("Sales") +
ggtitle("**Zoom-in on the forecast part") +
theme(legend.position="bottom") +
xlim(c(2010,2014)) + ylim(c(1800,3500))