Question 2

Relationship between Moving Average and Exponential Smoothing

Assume that we apply a moving average to a series, using a very short window span. If we wanted to achieve an equivalent result using simple exponential smoothing, what value should the smoothing constant take?

Both moving averages and exponential smoothing methods require a single variable to be input by the user; these are window width (w) and the smoothing constant (\(\alpha\)) respectively. The value entered determines how recent information is treated compared to older information. For \(\alpha\), a higher value will give more weight to recent observations. The inverse is true for w, where a lower number represents a narrower window and more preference to recent values. We can calculate when the two smoothers are approximately equal by using the formula: w = 2/\(\alpha\) - 1. In this example we are asked what value should be given to a smoothing constant to match a moving average with a very short window span. We will use the formula to determine which value the smoothing constant would receive for window widths of 1, 2, 3, and 4.

smooth_wind <- data.frame(
    window = c(1, 2, 3, 4),
    constant = c(1, .67, .5, .4))
knitr::kable(smooth_wind,booktabs = T)

window	constant
1	1.00
2	0.67
3	0.50
4	0.40

Question 5

dept_sales <- read.csv("DeptStoreSales.csv")
dept_ts <- ts(dept_sales$Sales, freq = 4)
autoplot(dept_ts)

train_len <- length(dept_ts) - 4

dept_train <- window(dept_ts, end = c(1, train_len))
dept_test <- window(dept_ts, start=c(1, train_len + 1))

A

Which of the following methods would not be suitable for forecasting this series. Explain why or why not for each one.

Moving average of raw series

Plotting the series reveals strong signs of seasonality and trend. Moving averages on raw time-series can only be done when they lack seasonality and trend.

Moving average of deseasonalized series

While seasonality has been removed, the time-series still contains a trend. A moving average would not be suitable for this series even after the first transformation.

Simple exponential smoothing of raw series

Simple exponential smoothing is very similar to moving averages in that they both should be used only when a series has no trend or seasonality. In this example we should not use a simple exponential smoothing method as the series contains both of these.

Double exponential smoothing of raw series

While double exponential smoothing is compatible with series containing additive/multiplicative trends, it is not suitable when seasonality is present. This model would not be suitable for the department store series.

Holt-Winter’s exponential smoothing of the raw series

Holt-Winter’s exponential smoothing method would be suitable for this series as it can work with both seasonality and trends. Where this model differs from double exponential smoothing is in its ability to identify what season a value k steps ahead is in. This can only be done if a full cycle of the seasons is available by the point where forecasting begins.

B

A forecaster was tasked to generate forecasts for 4 quarters ahead. She therefore partitioned the data so that the last 4 quarters were designated as the validation period. The foreccaster approached the forecasting task by using multiplicative Holt-Winter’s exponential smoothing. Specifically, you should call the hw function with the parameter seasonal=“multiplicative”. Let the method pick the appropriate parameters for \(\alpha\) , \(\beta\) , and \(\gamma\)

i

Run this method on the data. Request the forecasts on the validation period.(Note that the forecasted values for the validation set will be different than what the book shows.)

dept_hw <- hw(dept_train, seasonal = "multiplicative", h = 4)
summary(dept_hw)

## 
## Forecast method: Holt-Winters' multiplicative method
## 
## Model Information:
## Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = dept_train, h = 4, seasonal = "multiplicative") 
## 
##   Smoothing parameters:
##     alpha = 0.4032 
##     beta  = 0.1429 
##     gamma = 0.4549 
## 
##   Initial states:
##     l = 57401.8119 
##     b = 605.4045 
##     s=1.3012 0.9795 0.8614 0.8579
## 
##   sigma:  0.0258
## 
##      AIC     AICc      BIC 
## 372.3936 390.3936 381.3552 
## 
## Error measures:
##                   ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 246.631 1499.632 977.6487 0.3530739 1.665686 0.3137009
##                     ACF1
## Training set -0.07882461
## 
## Forecasts:
##      Point Forecast    Lo 80     Hi 80    Lo 95     Hi 95
## 6 Q1       61334.90 59303.21  63366.58 58227.70  64442.09
## 6 Q2       64971.30 62529.36  67413.25 61236.67  68705.94
## 6 Q3       76718.11 73376.84  80059.37 71608.08  81828.13
## 6 Q4       99420.55 94372.29 104468.81 91699.90 107141.20

autoplot(dept_hw)

ii

Using the forecasts for the validation set that you came up with in i. above, compute the MAPE values for the forecasts of quarters 21-22.

Quarter 21

Below we compute the MAPE for Quarter 21 using the accuracy() function as well as the MAPE function within the MLmetrics package. We see the MAPE value is quite low, only .88%.

kable(accuracy(dept_hw$mean, dept_test, test = 1))

	ME	RMSE	MAE	MPE	MAPE	ACF1	Theil’s U
Test set	-534.8988	534.8988	534.8988	-0.8797678	0.8797678	NA	NaN

MAPE(y_pred = dept_hw$mean[1], y_true = dept_test[1])

## [1] 0.008797678

Here we have the same measurements except for Quarter 22. Again we see a very low MAPE value of .11% that is agreed on between the two methods used.

Quarter 22

kable(accuracy(dept_hw$mean[2], dept_test[2]))

	ME	RMSE	MAE	MPE	MAPE
Test set	-71.303	71.303	71.303	-0.1098659	0.1098659

MAPE(y_pred = dept_hw$mean[2], y_true = dept_test[2])

## [1] 0.001098659

When the two quarters are combined for analysis we see a central value of .50% emerge for the MAPE.

Combined Quarters

kable(accuracy(dept_hw$mean[1:2], dept_test[1:2]))

	ME	RMSE	MAE	MPE	MAPE
Test set	-303.1009	381.5763	303.1009	-0.4948169	0.4948169

MAPE(y_pred = dept_hw$mean[1:2], y_true = dept_test[1:2])

## [1] 0.004948169

C

The fit and the residuals were displayed in the book. Please reproduce them with R code. Using all the information from (b) and your generated figures, is this model suitable for forecasting quarters 21 and 22?

plot(c(1,7), c(0, 100000), type = "n", xlab="Years",ylab="Sales (thousands)",bty="l", xaxt="n", yaxt="n")
axis(1, at = seq(1, 7, 1), labels = format(seq(1, 7, 1)))
axis(2, at = seq(0, 100000, 20000), labels = format(seq(0, 100000, 20000)))
lines(dept_ts, bty = "l")

lines(dept_hw$fitted, col = "red", lty = 2, lwd = 2)
lines(dept_hw$mean, col = "red", lty = 2, lwd = 2)
lines(dept_test)
legend(5,24000, c("Actual", "Forecast"),lwd=2, lty=c(1,2), col=c("black", "red"), bty="n")
title("Exp Smoothing: Actual Vs Forecast (Training Data)")

checkresiduals(dept_hw)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' multiplicative method
## Q* = 10.618, df = 3, p-value = 0.01398
## 
## Model df: 8.   Total lags used: 11

With the low MAPE scores for both Quarters 21 and 22 we already can see that there is not a large degree of deviation between forecast and validation points. Also review of the residuals helps us establish that we mostly follow a normal distribution and that the p-value is significant when using a total lag value of 4. This model seems to be suitable for forecasting the 2 quarters in question.

D

Another analyst decided to take a much simpler approach, and instead of using exponential smoothing he used differencing. Use differencing to remove the trend and seasonal pattern. Which order works better: first removing trend and then seasonality or the the opposite order? Show a the progression of time plots as you difference the data and each final series to provide evidence in support of your answer.

par(mfrow = c(2,3))
plot(dept_ts, main = "Raw Series")
plot(diff(dept_ts, lag = 1), main = "Trend Removed (Lag-1)")
plot(diff(diff(dept_ts, lag = 1), lag = 4), main = "+ Seasonality Removed (Lag-1/Lag-4)")

plot(dept_ts, main = "Raw Series")
plot(diff(dept_ts, lag = 4), main = "Seasonality Removed (Lag-4)")
plot(diff(diff(dept_ts, lag = 4), lag = 1), main = "+ Trend Removed (Lag-4/Lag-1)")

It would seem there isn’t much difference between the two methods and both seem to result in the same series. While it is true that the plots are quite differences after the first round of differencing, they are identical after the second. This is ideal as you would expect a model that removes both trend and seasonality to be effective at doing this in no particular order. If a specific order was involved it would seem to imply that one measurement is dependent on the other; if that was the case we would not treat them as seperate parts of a time-series. To be fair, there are surely many series where seasonality and trend are correlated, but if the order of these calculations truly impacted the result, it would prove that they are inseperable.

E

Forecast quarters 21-22 using the average of the double-differenced series from (d). Remember to use only the training period (until quarter 20), and to adjust back for the trend and seasonal pattern.

First we must convert what we found in the previous question back into a format that matches the original time series. (Not the most fun of tasks!) We can use the loop below to reintroduce our new and improved differenced model to the series it seeks to match. The two forecast values for Quarters 21 and 22 are listed last below (64080.58 and 68374.16, respectively)

act_fc <- vector()
diff_fc <- meanf(diff(diff(dept_ts, lag = 1), lag = 4), h = 4)

for (i in 1:4) {
  
  if(i == 1) {
    
    act_fc[i] <- diff_fc$mean[i] + dept_train[(train_len + i) - 4] + (dept_train[train_len] - dept_train[train_len - 4])
    
  } else {
    
    act_fc[i] <- diff_fc$mean[i] + dept_train[(train_len + i)- 4] + (act_fc[i - 1] - dept_train[train_len + i - 1 - 4])
  }
}

act_fc[1:2]

## [1] 64080.58 68374.16

F

Compare the forecasts from (e) to the exponential smoothing forecasts found in (b). Which of the two forecasting methods would you choose? Explain.

accuracy(act_fc[1:2], dept_test[1:2])

##                 ME     RMSE      MAE       MPE     MAPE
## Test set -3377.368 3378.755 3377.368 -5.374391 5.374391

accuracy(dept_hw$mean[1:2], dept_test[1:2])

##                 ME     RMSE      MAE        MPE      MAPE
## Test set -303.1009 381.5763 303.1009 -0.4948169 0.4948169

In this case I would choose the exponential smoothing forecast due to its much lower RMSE and MAPE values. While over-time, this method may prove to be an overfit to the series, for now it seems a good model for accounting for the seasonality and trend currently exhibited within the historical values. As with all time series it would be worth monitoring the series into the future to ensure that the model maintains adequate levels of forecasting accuracy.

G

What is an even simpler approach that should be compared as a baseline? Complete that comparison.

The Naive method should be used as a baseline whenever possible due to its simplicity and ease to understand. In this case we already have identified the presence of seasonality within the series, therefore a seasonal naive method would be most relevant in this example.

dept_naive <- snaive(dept_train, h = 4)
autoplot(dept_naive)

accuracy(dept_naive, dept_test)

##                   ME     RMSE     MAE      MPE     MAPE     MASE
## Training set 2925.25 3878.784 3116.50 4.344358 4.737739 1.000000
## Test set     6482.25 7032.176 6482.25 8.170540 8.170540 2.079978
##                    ACF1 Theil's U
## Training set 0.54855812        NA
## Test set     0.01334402 0.4705349

accuracy(dept_hw, dept_test)

##                   ME     RMSE       MAE       MPE     MAPE      MASE
## Training set 246.631 1499.632  977.6487 0.3530739 1.665686 0.3137009
## Test set     897.285 1981.638 1200.3859 0.7906391 1.285456 0.3851712
##                      ACF1 Theil's U
## Training set -0.078824613        NA
## Test set      0.009540924 0.1291442

In this situation we can see from the RMSE and MAPE values that the seasonal naive model does not provide the same level of accuracy as the Holt-Winter’s model that we calculate back during part b. It certainly is worth calculating the naive method early on in the analysis though, as it does not require the variable specification and finesse that other models may need (and who knows, it might end up being more accurate in the end).

Question 8

A

wine_df <- read.csv("AustralianWines.csv")
par(mfrow = c(2,3))
for (i in 2:7) {
temp_ts <- ts(wine_df[,i], start = c(1980,1))
plot (temp_ts, main = names(wine_df[i]))
}

I would choose the Holt-Winter’s smoothing method as at least half of the series show clear signs of containing both seasonality and trend. As Holt-Winter’s exponential smoothing is adept at handling both of these elements it would be best if only one method can be used across all 6 series.

B

wine_ts <- ts(wine_df$Fortified, start = c(1980,1), frequency = 12)
wine_train <- window(wine_ts, end = c(1993, 12))
wine_test <- window(wine_ts, start = c(1994, 1))

wine_hw <- hw(wine_train, seasonal = "multiplicative", h = 12)
autoplot(wine_hw)

C

checkresiduals(wine_hw)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' multiplicative method
## Q* = 39.543, df = 8, p-value = 3.897e-06
## 
## Model df: 16.   Total lags used: 24

plot(wine_hw$residuals)

i

Decembers (month 12) are not captured well by the model
At first glance this seems to be false as the 12 month lag point of the ACF plot shows a large (and significant) spike. Fast forward the lag 24 or 36 months though and the effect diminishes while most other months remain constain.
There is a strong correlation between sales on the same calendar month
Although the Holt-Winter’s model seeks to strip seasonality and trend, there are apparent months within the lag/residual sequence that continue to bear signs of being correlated depending on month/sales.
The model does not capture seasonality well
While the p-value of the ljung-box test is significant, I will agree in the slightest due to the power of the human eye. It does appear that some elements of seasonality remain with the plot despite the efforts of the Holt-Winters method.

ii

Between the evident seasonality still within the residuals and the obvious coxing of Question 8ci(1) it might be worth exploring the presence of multiple levels of seasonality within the model. Especially with December representing such an extreme value even after smoothing has occurred, it might be worth exploring methods such as Taylor (2003) double-seasonal Holt-Winter’s method or the TBATS/STL+ approaches. R makes this easy with the use of the dshw/tbats/stlm functions; respective to the order of mention within this response.

MBA 678 Assignment 4

Colin Vail

February 26, 2018