Assume that we apply a moving average to a series, using a very short window span. If we wanted to achieve an equivalent result using simple exponential smoothing, what value should the smoothing constant take?
Both moving averages and exponential smoothing methods require a single variable to be input by the user; these are window width (w) and the smoothing constant (\(\alpha\)) respectively. The value entered determines how recent information is treated compared to older information. For \(\alpha\), a higher value will give more weight to recent observations. The inverse is true for w, where a lower number represents a narrower window and more preference to recent values. We can calculate when the two smoothers are approximately equal by using the formula: w = 2/\(\alpha\) - 1. In this example we are asked what value should be given to a smoothing constant to match a moving average with a very short window span. We will use the formula to determine which value the smoothing constant would receive for window widths of 1, 2, 3, and 4.
smooth_wind <- data.frame(
window = c(1, 2, 3, 4),
constant = c(1, .67, .5, .4))
knitr::kable(smooth_wind,booktabs = T)| window | constant |
|---|---|
| 1 | 1.00 |
| 2 | 0.67 |
| 3 | 0.50 |
| 4 | 0.40 |
dept_sales <- read.csv("DeptStoreSales.csv")
dept_ts <- ts(dept_sales$Sales, freq = 4)
autoplot(dept_ts)train_len <- length(dept_ts) - 4
dept_train <- window(dept_ts, end = c(1, train_len))
dept_test <- window(dept_ts, start=c(1, train_len + 1))Which of the following methods would not be suitable for forecasting this series. Explain why or why not for each one.
Moving average of raw series
Moving average of deseasonalized series
Simple exponential smoothing of raw series
Double exponential smoothing of raw series
Holt-Winter’s exponential smoothing of the raw series
A forecaster was tasked to generate forecasts for 4 quarters ahead. She therefore partitioned the data so that the last 4 quarters were designated as the validation period. The foreccaster approached the forecasting task by using multiplicative Holt-Winter’s exponential smoothing. Specifically, you should call the hw function with the parameter seasonal=“multiplicative”. Let the method pick the appropriate parameters for \(\alpha\) , \(\beta\) , and \(\gamma\)
Run this method on the data. Request the forecasts on the validation period.(Note that the forecasted values for the validation set will be different than what the book shows.)
dept_hw <- hw(dept_train, seasonal = "multiplicative", h = 4)
summary(dept_hw)##
## Forecast method: Holt-Winters' multiplicative method
##
## Model Information:
## Holt-Winters' multiplicative method
##
## Call:
## hw(y = dept_train, h = 4, seasonal = "multiplicative")
##
## Smoothing parameters:
## alpha = 0.4032
## beta = 0.1429
## gamma = 0.4549
##
## Initial states:
## l = 57401.8119
## b = 605.4045
## s=1.3012 0.9795 0.8614 0.8579
##
## sigma: 0.0258
##
## AIC AICc BIC
## 372.3936 390.3936 381.3552
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 246.631 1499.632 977.6487 0.3530739 1.665686 0.3137009
## ACF1
## Training set -0.07882461
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 6 Q1 61334.90 59303.21 63366.58 58227.70 64442.09
## 6 Q2 64971.30 62529.36 67413.25 61236.67 68705.94
## 6 Q3 76718.11 73376.84 80059.37 71608.08 81828.13
## 6 Q4 99420.55 94372.29 104468.81 91699.90 107141.20
autoplot(dept_hw)Using the forecasts for the validation set that you came up with in i. above, compute the MAPE values for the forecasts of quarters 21-22.
Below we compute the MAPE for Quarter 21 using the accuracy() function as well as the MAPE function within the MLmetrics package. We see the MAPE value is quite low, only .88%.
kable(accuracy(dept_hw$mean, dept_test, test = 1))| ME | RMSE | MAE | MPE | MAPE | ACF1 | Theil’s U | |
|---|---|---|---|---|---|---|---|
| Test set | -534.8988 | 534.8988 | 534.8988 | -0.8797678 | 0.8797678 | NA | NaN |
MAPE(y_pred = dept_hw$mean[1], y_true = dept_test[1])## [1] 0.008797678
Here we have the same measurements except for Quarter 22. Again we see a very low MAPE value of .11% that is agreed on between the two methods used.
kable(accuracy(dept_hw$mean[2], dept_test[2]))| ME | RMSE | MAE | MPE | MAPE | |
|---|---|---|---|---|---|
| Test set | -71.303 | 71.303 | 71.303 | -0.1098659 | 0.1098659 |
MAPE(y_pred = dept_hw$mean[2], y_true = dept_test[2])## [1] 0.001098659
When the two quarters are combined for analysis we see a central value of .50% emerge for the MAPE.
kable(accuracy(dept_hw$mean[1:2], dept_test[1:2]))| ME | RMSE | MAE | MPE | MAPE | |
|---|---|---|---|---|---|
| Test set | -303.1009 | 381.5763 | 303.1009 | -0.4948169 | 0.4948169 |
MAPE(y_pred = dept_hw$mean[1:2], y_true = dept_test[1:2])## [1] 0.004948169
The fit and the residuals were displayed in the book. Please reproduce them with R code. Using all the information from (b) and your generated figures, is this model suitable for forecasting quarters 21 and 22?
plot(c(1,7), c(0, 100000), type = "n", xlab="Years",ylab="Sales (thousands)",bty="l", xaxt="n", yaxt="n")
axis(1, at = seq(1, 7, 1), labels = format(seq(1, 7, 1)))
axis(2, at = seq(0, 100000, 20000), labels = format(seq(0, 100000, 20000)))
lines(dept_ts, bty = "l")
lines(dept_hw$fitted, col = "red", lty = 2, lwd = 2)
lines(dept_hw$mean, col = "red", lty = 2, lwd = 2)
lines(dept_test)
legend(5,24000, c("Actual", "Forecast"),lwd=2, lty=c(1,2), col=c("black", "red"), bty="n")
title("Exp Smoothing: Actual Vs Forecast (Training Data)")checkresiduals(dept_hw)##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' multiplicative method
## Q* = 10.618, df = 3, p-value = 0.01398
##
## Model df: 8. Total lags used: 11
With the low MAPE scores for both Quarters 21 and 22 we already can see that there is not a large degree of deviation between forecast and validation points. Also review of the residuals helps us establish that we mostly follow a normal distribution and that the p-value is significant when using a total lag value of 4. This model seems to be suitable for forecasting the 2 quarters in question.
Another analyst decided to take a much simpler approach, and instead of using exponential smoothing he used differencing. Use differencing to remove the trend and seasonal pattern. Which order works better: first removing trend and then seasonality or the the opposite order? Show a the progression of time plots as you difference the data and each final series to provide evidence in support of your answer.
par(mfrow = c(2,3))
plot(dept_ts, main = "Raw Series")
plot(diff(dept_ts, lag = 1), main = "Trend Removed (Lag-1)")
plot(diff(diff(dept_ts, lag = 1), lag = 4), main = "+ Seasonality Removed (Lag-1/Lag-4)")
plot(dept_ts, main = "Raw Series")
plot(diff(dept_ts, lag = 4), main = "Seasonality Removed (Lag-4)")
plot(diff(diff(dept_ts, lag = 4), lag = 1), main = "+ Trend Removed (Lag-4/Lag-1)")It would seem there isn’t much difference between the two methods and both seem to result in the same series. While it is true that the plots are quite differences after the first round of differencing, they are identical after the second. This is ideal as you would expect a model that removes both trend and seasonality to be effective at doing this in no particular order. If a specific order was involved it would seem to imply that one measurement is dependent on the other; if that was the case we would not treat them as seperate parts of a time-series. To be fair, there are surely many series where seasonality and trend are correlated, but if the order of these calculations truly impacted the result, it would prove that they are inseperable.
Forecast quarters 21-22 using the average of the double-differenced series from (d). Remember to use only the training period (until quarter 20), and to adjust back for the trend and seasonal pattern.
First we must convert what we found in the previous question back into a format that matches the original time series. (Not the most fun of tasks!) We can use the loop below to reintroduce our new and improved differenced model to the series it seeks to match. The two forecast values for Quarters 21 and 22 are listed last below (64080.58 and 68374.16, respectively)
act_fc <- vector()
diff_fc <- meanf(diff(diff(dept_ts, lag = 1), lag = 4), h = 4)
for (i in 1:4) {
if(i == 1) {
act_fc[i] <- diff_fc$mean[i] + dept_train[(train_len + i) - 4] + (dept_train[train_len] - dept_train[train_len - 4])
} else {
act_fc[i] <- diff_fc$mean[i] + dept_train[(train_len + i)- 4] + (act_fc[i - 1] - dept_train[train_len + i - 1 - 4])
}
}
act_fc[1:2]## [1] 64080.58 68374.16
Compare the forecasts from (e) to the exponential smoothing forecasts found in (b). Which of the two forecasting methods would you choose? Explain.
accuracy(act_fc[1:2], dept_test[1:2])## ME RMSE MAE MPE MAPE
## Test set -3377.368 3378.755 3377.368 -5.374391 5.374391
accuracy(dept_hw$mean[1:2], dept_test[1:2])## ME RMSE MAE MPE MAPE
## Test set -303.1009 381.5763 303.1009 -0.4948169 0.4948169
In this case I would choose the exponential smoothing forecast due to its much lower RMSE and MAPE values. While over-time, this method may prove to be an overfit to the series, for now it seems a good model for accounting for the seasonality and trend currently exhibited within the historical values. As with all time series it would be worth monitoring the series into the future to ensure that the model maintains adequate levels of forecasting accuracy.
What is an even simpler approach that should be compared as a baseline? Complete that comparison.
The Naive method should be used as a baseline whenever possible due to its simplicity and ease to understand. In this case we already have identified the presence of seasonality within the series, therefore a seasonal naive method would be most relevant in this example.
dept_naive <- snaive(dept_train, h = 4)
autoplot(dept_naive)accuracy(dept_naive, dept_test)## ME RMSE MAE MPE MAPE MASE
## Training set 2925.25 3878.784 3116.50 4.344358 4.737739 1.000000
## Test set 6482.25 7032.176 6482.25 8.170540 8.170540 2.079978
## ACF1 Theil's U
## Training set 0.54855812 NA
## Test set 0.01334402 0.4705349
accuracy(dept_hw, dept_test)## ME RMSE MAE MPE MAPE MASE
## Training set 246.631 1499.632 977.6487 0.3530739 1.665686 0.3137009
## Test set 897.285 1981.638 1200.3859 0.7906391 1.285456 0.3851712
## ACF1 Theil's U
## Training set -0.078824613 NA
## Test set 0.009540924 0.1291442
In this situation we can see from the RMSE and MAPE values that the seasonal naive model does not provide the same level of accuracy as the Holt-Winter’s model that we calculate back during part b. It certainly is worth calculating the naive method early on in the analysis though, as it does not require the variable specification and finesse that other models may need (and who knows, it might end up being more accurate in the end).
wine_df <- read.csv("AustralianWines.csv")
par(mfrow = c(2,3))
for (i in 2:7) {
temp_ts <- ts(wine_df[,i], start = c(1980,1))
plot (temp_ts, main = names(wine_df[i]))
}I would choose the Holt-Winter’s smoothing method as at least half of the series show clear signs of containing both seasonality and trend. As Holt-Winter’s exponential smoothing is adept at handling both of these elements it would be best if only one method can be used across all 6 series.
wine_ts <- ts(wine_df$Fortified, start = c(1980,1), frequency = 12)
wine_train <- window(wine_ts, end = c(1993, 12))
wine_test <- window(wine_ts, start = c(1994, 1))
wine_hw <- hw(wine_train, seasonal = "multiplicative", h = 12)
autoplot(wine_hw)checkresiduals(wine_hw)##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' multiplicative method
## Q* = 39.543, df = 8, p-value = 3.897e-06
##
## Model df: 16. Total lags used: 24
plot(wine_hw$residuals)Decembers (month 12) are not captured well by the model
At first glance this seems to be false as the 12 month lag point of the ACF plot shows a large (and significant) spike. Fast forward the lag 24 or 36 months though and the effect diminishes while most other months remain constain.
There is a strong correlation between sales on the same calendar month
Although the Holt-Winter’s model seeks to strip seasonality and trend, there are apparent months within the lag/residual sequence that continue to bear signs of being correlated depending on month/sales.
The model does not capture seasonality well
While the p-value of the ljung-box test is significant, I will agree in the slightest due to the power of the human eye. It does appear that some elements of seasonality remain with the plot despite the efforts of the Holt-Winters method.
Between the evident seasonality still within the residuals and the obvious coxing of Question 8ci(1) it might be worth exploring the presence of multiple levels of seasonality within the model. Especially with December representing such an extreme value even after smoothing has occurred, it might be worth exploring methods such as Taylor (2003) double-seasonal Holt-Winter’s method or the TBATS/STL+ approaches. R makes this easy with the use of the dshw/tbats/stlm functions; respective to the order of mention within this response.