Homework 4: Questions 2, 5, and 8 from Chapter 6


Problem 2: Assume that we apply a moving average to a series, using a very short window span. If we wanted to achieve an equivalent result using simple exponential smoothing, what value should the smoothing constant take?.

The moving average is a simple smoother; it consists of averaging values across a window of consecutive periods, thereby generating a series of averages. A moving average with window width (W) means averaging across each set of w consecutive values, where w is determined by the user. For visualization, wider windows will expose more global trends, while narrow windows will reveal local trend (but may be subject to under-smoothing)

Simple exponential smoothing is similar to forecasting with a moving average, except that instead of taking a simple average over the w most recent values, we take a weighted average of all past values, so that the weights decrease exponentially into the past. In order to achieve an equivalent result using simple exponential smoothing, the smoothing constant needs to be closer to 1, since it indicates fast learning and only the most recent values influence the forecast. (Whereas a value close to 0 indicates slow learning and past observations have a large influence on forecast).

Problem 5: Forecasting Department Store Sales.

a. Which of the following methods would not be suitable for forecasting this series. Explain why or why not for each one.

Moving average of raw series. Not suitable: Moving averages can generally only be used for series without trend and seasonality. This series has both.

Moving average of deseasonalized series. Not suitable for the same reason, although it could be used if the series was both deseasonalized and de-trended.

Simple exponential smoothing of the raw series. Not suitable: Like moving averages, this should only be used on series without trend and seasonality.

Double exponential smoothing of the raw series. Not suitable: Double exponential smoothing works on series with trend, but not seasonality.

Holt-Winter’s exponential smoothing of the raw series. Suitable, because it works on series with trend, seasonality and level over time.

b. A forecaster was tasked to generate forecasts for 4 quarters ahead. She therefore partitioned the data so that the last 4 quarters were designated as the validation period. The forecaster approached the forecasting task by using multiplicative Holt-Winter’s exponential smoothing. Specifically, you should call the hw function with the parameter seasonal=“multiplicative”. Let the method pick the appropriate parameters for ??, ??, and ??.

i. Run this method on the data. Request the forecasts on the validation period.(Note that the forecasted values for the validation set will be different than what the book shows.).

library(forecast)
library(zoo)
library(dplyr)
library(timeDate)
DeptStoreSales<- read.csv("DeptStoreSales.csv", stringsAsFactors = FALSE)
str(DeptStoreSales)
## 'data.frame':    24 obs. of  2 variables:
##  $ Quarter: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Sales  : int  50147 49325 57048 76781 48617 50898 58517 77691 50862 53028 ...
DeptStoreSales.TS <- ts(DeptStoreSales$Sales, start=c(1,1), frequency=4)
yrange = range(DeptStoreSales.TS)
ValidLength <- 4
TrainLength <-length(DeptStoreSales.TS) - ValidLength

DeptTrain <- window(DeptStoreSales.TS, start = c(1), end=c(1, TrainLength))
DeptValid <- window(DeptStoreSales.TS, start=c(1,TrainLength + 1), end=c(1, ValidLength + TrainLength))
HWSales <- hw(DeptTrain, seasonal="multiplicative", h=4)
summary(HWSales)
## 
## Forecast method: Holt-Winters' multiplicative method
## 
## Model Information:
## Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = DeptTrain, h = 4, seasonal = "multiplicative") 
## 
##   Smoothing parameters:
##     alpha = 0.4032 
##     beta  = 0.1429 
##     gamma = 0.4549 
## 
##   Initial states:
##     l = 57401.8119 
##     b = 605.4045 
##     s=1.3012 0.9795 0.8614 0.8579
## 
##   sigma:  0.0258
## 
##      AIC     AICc      BIC 
## 372.3936 390.3936 381.3552 
## 
## Error measures:
##                   ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 246.631 1499.632 977.6487 0.3530739 1.665686 0.3137009
##                     ACF1
## Training set -0.07882461
## 
## Forecasts:
##      Point Forecast    Lo 80     Hi 80    Lo 95     Hi 95
## 6 Q1       61334.90 59303.21  63366.58 58227.70  64442.09
## 6 Q2       64971.30 62529.36  67413.25 61236.67  68705.94
## 6 Q3       76718.11 73376.84  80059.37 71608.08  81828.13
## 6 Q4       99420.55 94372.29 104468.81 91699.90 107141.20

ii. Using the forecasts for the validation set that you came up with in i. above, compute the MAPE values for the forecasts of quarters 21-22.

accuracy(HWSales, DeptValid, test = 1)
##                 ME     RMSE      MAE        MPE      MAPE      MASE ACF1
## Test set -534.8988 534.8988 534.8988 -0.8797678 0.8797678 0.1716345   NA
##          Theil's U
## Test set       NaN
accuracy(HWSales, DeptValid, test = 2)
##               ME   RMSE    MAE        MPE      MAPE       MASE ACF1
## Test set -71.303 71.303 71.303 -0.1098659 0.1098659 0.02287919   NA
##           Theil's U
## Test set 0.01739097

c. The fit and the residuals were displayed in the book. Please reproduce them with R code. Using all the information from b and your generated figures, is this model suitable for forecasting quarters 21 and 22?

yrange = range(DeptStoreSales.TS - ValidLength)
plot(c(1,5.5), yrange, xlab="Year", ylab="Sales Dollars (in thousands)", ylim=c(50,90), bty="l", main="Exponential Smoothing: Actual Vs. Forecast (Training Data)")
lines(DeptStoreSales.TS/1000, bty="l", col = "blue", lwd=2)
lines(HWSales$fitted/1000, col="red", lwd=2)
legend(3,40, c("Actual", "Forecasted"), lty=c(1,1), col=c("blue", "red"), bty="n")

plot(HWSales$residuals, xlab = "Year", ylab = "Residual Values of Sales", bty = "l", xaxt = "n", yaxt = "n", lwd = 2, main = "Exponential Smoothing Forecast Errors (Training Data)", bty = "n")
axis(1,at=seq(1,7,1), labels = format(seq(1,7,1)), pos = 0)
axis(2,at=seq(-0.06,0.07, .01),labels = format(seq(-6000,7000,1000)),las = 2)

Based on the data above, the forecast is a good predictor of quorters 21 and 22. The measured MAPE values are small and the residuals are also fairly low. Model seems to be fitting the dataset quite well.

d. Another analyst decided to take a much simpler approach, and instead of using exponential smoothing he used differencing. Use differencing to remove the trend and seasonal pattern. Which order works better: first removing trend and then seasonality or the opposite order? Show a progression of time plots as you difference the data and each final series to provide evidence in support of your answer.

par(mfrow = c(2,2))
plot(DeptStoreSales.TS/1000, ylab = "Sales Dollars (in thousands)", xlab = "Year", bty = "l", main = "Department Store Sales")
plot(diff(DeptStoreSales.TS, lag=4),  ylab = "Lag-4", xlab = "Year", bty = "l", main = "Lag-4 Difference")
plot(diff(DeptStoreSales.TS,lag=1), ylab = "Lag-1", xlab = "Year", bty = "l", main = "Lag-1 Difference")
Lag4Lag1<- diff(diff(DeptStoreSales.TS, lag=4), lag=1)
plot(Lag4Lag1, ylab = "Lag-4, then Lag-1", xlab = "Year", bty = "l", main = "Twice-Differenced (Lag-4, Lag-1)")

par(mfrow = c(2,2))
plot(DeptStoreSales.TS/1000, ylab = "Sales Dollas (in thousands)", xlab = "Year", bty = "l", main = "Dept Store Sales")
plot(diff(DeptStoreSales.TS, lag=1),  ylab = "Lag-1", xlab = "Year", bty = "l", main = "Lag-1 Difference")
plot(diff(DeptStoreSales.TS, lag=4),  ylab = "Lag-4", xlab = "Year", bty = "l", main = "Lag-4 Difference")
Lag1Lag4<- diff(diff(DeptStoreSales.TS, lag=1), lag=4)
plot(Lag1Lag4, ylab = "Lag-1, then Lag-4", xlab = "Year", bty = "l", main = "Twice-Differenced (Lag-1, Lag-4)")

It appears as though there is no difference if you deseasonalize first and then detrend or vice versa.

e. Forecast quarters 21-22 using the average of the double-differenced series from (d). Remember to use only the training period (until quarter 20), and to adjust back for the trend and seasonal pattern.

DDForecast <-meanf(diff(diff(DeptTrain, lag = 4), lag = 1), h = 2)
DDForecast
##      Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## 6 Q1          569.2 -2116.935 3255.335 -3714.114 4852.514
## 6 Q2          569.2 -2116.935 3255.335 -3714.114 4852.514
RealForecast <-vector()
for (i in 1:ValidLength) {
  if(i == 1) {
    RealForecast[i] <- DDForecast$mean[i] + DeptTrain[(TrainLength+i)-ValidLength] + (DeptTrain[TrainLength] - DeptTrain[TrainLength - ValidLength])
  } else {
    RealForecast[i] <- DDForecast$mean[i] + DeptTrain[(TrainLength+i)-ValidLength] + (RealForecast[i-1] - DeptTrain[TrainLength+i-1-ValidLength])
  }
}
RealForecast
## [1] 63982.2 68177.4      NA      NA
accuracy(RealForecast, DeptValid)
##               ME     RMSE    MAE       MPE     MAPE ACF1 Theil's U
## Test set -3229.8 3230.151 3229.8 -5.141902 5.141902 -0.5   0.13634

f. Compare the forecasts from (e) to the exponential smoothing forecasts found in (b). Which of the two forecasting methods would you choose? Explain..

Personally, I would pick the Holt-Winters’ model. The model is easier to use, took fewer steps to compute and understand. Additionally, MAPE score on the double-difference method is above 5%, whereas for the Holt-Winters’ model they are significantly lower.

g. What is an even simpler approach that should be compared as a baseline? Complete that comparison..

An even simpler approach would be to use seasonal naive forecast method, as there is clear seasonality to data:

DeptSnaive <- snaive(DeptTrain,h=ValidLength)
DeptSnaive
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 6 Q1          56405 51434.14 61375.86 48802.72 64007.28
## 6 Q2          60031 55060.14 65001.86 52428.72 67633.28
## 6 Q3          71486 66515.14 76456.86 63883.72 79088.28
## 6 Q4          92183 87212.14 97153.86 84580.72 99785.28
plot(c(1,7),yrange,type="n",xlab="Year",ylab="Sales Dollars (in thousands)",bty="l",xaxt="n",yaxt="n", main = "Comparison of Models")
axis(1, at=seq(1,7,1), labels=format(seq(1,7,1)))
axis(2, at=seq(0,110000,10000), labels=format(seq(0,110,10)), las=2)
lines(DeptStoreSales.TS)
lines(DeptSnaive$mean, col="green", lwd=2)
lines(HWSales$mean,lwd=2,col="red")
legend(2,100000, c("Actual","Holt-Winter's", "Seasonal Naive Forecast"),lwd=2, col=c("black", "red","green"), bty="n")

accuracy(DeptSnaive, DeptValid)
##                   ME     RMSE     MAE      MPE     MAPE     MASE
## Training set 2925.25 3878.784 3116.50 4.344358 4.737739 1.000000
## Test set     6482.25 7032.176 6482.25 8.170540 8.170540 2.079978
##                    ACF1 Theil's U
## Training set 0.54855812        NA
## Test set     0.01334402 0.4705349

Despite from being easier, as compared to the Holt-Winter’s method, seasonal naive forecasting method is not ideal for this situation. As evidenced from the chart above, seasonal naive method significantly underforecasts sales. Additionally, the MAPE of the Naive forecast is over 8% for the test set, which is not as good as in the Holt-Winter’s method performed.

Problem 8: Forecasting Australian Wine Sales.

A. Which smoothing method would you choose if you had to choose the same method for forecasting all series? Why?.

All of the 6wine types have some seasonality to them, and most have some trend. Because of that, I will need to use a method that can accomodate both of these conditions, making me select the Holt-Winter’s method

B. You are asked to focus on fortified wine sales alone and produce as accurate a forecast as possible for the next 2 months. Start by partitioning the data using the period until Dec-1993 as the training period. Apply Holt-Winter’s exponential smoothing (with multiplicative seasonality) to sales..

WineSales <- read.csv("AustralianWines.csv", stringsAsFactors = FALSE)
WineSales <- na.omit(WineSales)
WineSales.TS <- ts(WineSales$Fortified, start = c(1980, 1), frequency = 12)
WValidLength <- 12
WTrainLength <- length(WineSales.TS) - WValidLength
WSalesTrain <- window(WineSales.TS,end = c(1980, WTrainLength))
WSalesValid <- window(WineSales.TS, start = c(1980, WTrainLength + 1), end = c(1980, WTrainLength + WValidLength))
HWWine<-  hw(WSalesTrain, seasonal="multiplicative", h=12)
HWWine
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 1994       1326.700 1178.965 1474.436 1100.758 1552.642
## Feb 1994       1566.896 1392.151 1741.641 1299.647 1834.145
## Mar 1994       1857.505 1650.032 2064.979 1540.202 2174.809
## Apr 1994       2003.232 1779.128 2227.335 1660.494 2345.969
## May 1994       2396.889 2128.312 2665.466 1986.136 2807.642
## Jun 1994       2426.159 2153.851 2698.467 2009.700 2842.618
## Jul 1994       2957.688 2625.158 3290.219 2449.127 3466.249
## Aug 1994       2736.162 2428.002 3044.322 2264.872 3207.452
## Sep 1994       2025.046 1796.568 2253.523 1675.620 2374.472
## Oct 1994       1879.638 1667.178 2092.097 1554.709 2204.566
## Nov 1994       2164.681 1919.545 2409.818 1789.778 2539.585
## Dec 1994       2276.860 2018.526 2535.193 1881.772 2671.947
accuracy(HWWine, WSalesValid)
##                     ME     RMSE      MAE       MPE      MAPE      MASE
## Training set -22.06217 280.3027 221.7110 -1.459040  7.260039 0.7967865
## Test set     115.04537 337.3427 265.2853  3.739657 11.246490 0.9533845
##                    ACF1 Theil's U
## Training set 0.03407816        NA
## Test set     0.01658673 0.7213012
HWWine$mean[1:2]
## [1] 1326.700 1566.896
plot(HWWine, ylab = "Sales (in thousands)")

C. Create a plot for the residuals from the Holt-Winter’s exponential smoothing.?.

plot(HWWine$residuals, xlab = "Year", ylab = "Residuals (in thousands of liters)", bty = "l", xaxt = "n", yaxt = "n", lwd = 2, main = "Residuals of Fortified Wine Sales", bty = "n", col = "green")
axis(1,at=seq(1980,1994,1), labels = format(seq(1980,1994,1)))
axis(2,at=seq(-.5,.5,.05),labels = format(seq(-5000,5000,500)),las = 2)

checkresiduals(HWWine)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' multiplicative method
## Q* = 39.543, df = 8, p-value = 3.897e-06
## 
## Model df: 16.   Total lags used: 24

i. Based on this plot, which of the following statements are reasonable?.

Decembers (month 12) are not captured well by the model: Reasonable. Residuals tend to be very high in December months.

There is a strong correlation between sales on the same calendar month. Somewhat Reasonable. The actual and forecast models show the seasonality better than can be explained by the residual plot, but residuals are close to zero in some periods (for example, in the third quorter), which indicates the correlation by calendar month.

The model does not capture the seasonality well. Unreasonable. This model is designed to capture seasonality. Additionally, the residuals are off in opposite directions in Decembers of each year indicating that sales can vary is either direction, and seasonality is not the reason for that.

We should first deseasonalize the data and then apply Holt-Winter’s exponential smoothing. Unreasonable. This method is designed to handle seasonality as well as trend.

ii. How can you handle the above effect with exponential smoothing?.

While we are looking at monthly seasonality, it may be a good idea to also look at the annual seasonality. A Double-seasonal Holt-Winter’s model may be helpful with this, especially because of more than one seasonal pattern in a series as evidenced by the December numbers.