Chapter 5: #2, 5, 8

Question 2

Relationship between Moving Average and Exponential Smoothing: Assume that we apply a moving average to a series, using a very short window span. If we wanted to achieve an equivalent result using simple exponential smoothing, what value should the smoothing constant take?

To achieve an equivalent result using simple exponential smoothing, the smoothing constant would need to be closer to 1. As alpha approaches 1 the weight is shifted much more towards the latest data. This is considered a faster learning curve for the model. If alpha was lower, closer to 0, we would have a slower learning curve, placing a higher weight on older observations. Knowing that alpha = 2/(w+1), a window width of 1 would result in an alpha of 1 (naive forecast). Likewise, a window width of 3 would impute an alpha of .5.

Question 5

Forecasting Department Store Sales

(A)

Which of the following methods would not be suitable for forecasting this series. Explain why or why not for each one.

Moving average of raw series:

Not suitable - moving average cannot handle seasonality or trends, which are both present in this data set

Moving average of deseasonalized series:

Not suitable - while the clear seasonality may be removed, there is still a trend present. The data would need to be de-trended for this method to work

Simple exponential smoothing of the raw series:

Not Suitable - simple exponential smoothing can only be done on data sets that are stationary - no seasonality or trend (or would need to be removed prior to use).

Double exponential smoothing of the raw series:

Not suitable - this method could account for the trend in the data, but not the seasonality.

Holt-Winter’s exponential smoothing of the raw series:

Suitable - this model takes into account changes in trend, seasonality, and level over time. This is the only model here that should work well with the Department Store Sales data.

(B)

A forecaster was tasked to generate forecasts for 4 quarters ahead. She therefore partitioned the data so that the last 4 quarters were designated as the validation period. The foreccaster approached the forecasting task by using multiplicative Holt-Winter’s exponential smoothing. Specifically, you should call the ets function with the parameters restrict=FALSE, model = “ZZZ” and use the smoothing constants of alpha=0.2, beta=0.15, and y=0.05

i. Run this method on the data. Request the forecasts on the validation period.(Note that the forecasted values for the validation set will be different than what the book shows.)

#libraries
library(forecast)
library(zoo)
library(dplyr)
library(timeDate)
DeptStore<- read.csv("DeptStoreSales.csv", stringsAsFactors = FALSE)
str(DeptStore)
## 'data.frame':    24 obs. of  2 variables:
##  $ Quarter: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Sales  : int  50147 49325 57048 76781 48617 50898 58517 77691 50862 53028 ...
#Time_series
DeptStore.TS <- ts(DeptStore$Sales, start=c(1), frequency=4)
yrange = range(DeptStore.TS)

#Set training and validation
DeptStoreValidLength <- 4
DeptStoreTrainLength <- length(DeptStore.TS) - DeptStoreValidLength

DeptTrain.TS <- window(DeptStore.TS, start=c(1), end=c(1, DeptStoreTrainLength))
DeptValid.TS <- window(DeptStore.TS, start=c(1,DeptStoreTrainLength+1), end=c(1, DeptStoreValidLength+DeptStoreTrainLength))
Hwin <- ets(DeptTrain.TS, model = "ZZZ", alpha=0.2, beta=0.15, gamma=0.05, restrict = FALSE)
Hwin.pred <- forecast(Hwin, h = DeptStoreValidLength)
Hwin.pred
##      Point Forecast    Lo 80     Hi 80    Lo 95     Hi 95
## 6 Q1       62115.16 60268.28  63962.04 59290.60  64939.72
## 6 Q2       65371.60 63317.68  67425.53 62230.40  68512.81
## 6 Q3       77076.87 74420.49  79733.24 73014.29  81139.45
## 6 Q4      102937.73 98935.69 106939.77 96817.13 109058.33

ii. Using the forecasts for the validation set that you came up with in i. above, compute the MAPE values for the forecasts of quarters 21-22.

#MAPE Values - Quarter 21
accuracy(Hwin.pred, DeptValid.TS, test = 1)
##                 ME     RMSE      MAE       MPE     MAPE      MASE ACF1
## Test set -1315.156 1315.156 1315.156 -2.163086 2.163086 0.4219979   NA
##          Theil's U
## Test set       NaN

MAPE for Quarter 21 is 2.16

#MAPE Values - Quarter 22
accuracy(Hwin.pred, DeptValid.TS, test = 2)
##                 ME     RMSE      MAE        MPE      MAPE      MASE ACF1
## Test set -471.6041 471.6041 471.6041 -0.7266627 0.7266627 0.1513249   NA
##          Theil's U
## Test set 0.1150254

MAPE for Quarter 22 is 0.727

(C)

The fit and the residuals were displayed in the book. Please reproduce them with R code. Using all the information from (b) and your generated figures, is this model suitable for forecasting quarters 21 and 22?

yrange = range(DeptStore.TS - DeptStoreValidLength)
plot(c(1,5.5), yrange, xlab="Year", ylab="Sales Dollars (in thousands)", ylim=c(0,105), bty="l", main="Exp Smoothing: Actual Vs. Forecast (Training Data)")
lines(DeptStore.TS/1000, bty="l")
lines(Hwin$fitted/1000, col="red", lwd=2)
legend(3,40, c("Actual", "Forecasted"), lty=c(1,1), col=c("black", "Red"), bty="n")

plot(Hwin$residuals, xlab = "Year", ylab = "Residual Values of Sales", bty = "l", xaxt = "n", yaxt = "n", lwd = 2, main = "Exp Smoothing Forecast Errors (Training Data)", bty = "n")
axis(1,at=seq(1,7,1), labels = format(seq(1,7,1)), pos = 0)
axis(2,at=seq(-0.06,0.07, .01),labels = format(seq(-6000,7000,1000)),las = 2)

Yes. The MAPE values for quarters 21 and 22 are very low. The residuals are also fairly low. Looking at the actual to forecast chart, we can see that our forecast is very much in line with the actual and should be a good predictor of quarters 21 and 22.

(D)

Another analyst decided to take a much simpler approach, and instead of using exponential smoothing he used differencing. Use differencing to remove the trend and seasonal pattern. Which order works better: first removing trend and then seasonality or the the opposite order? Show a progression of time plots as you difference the data and each final series to provide evidence in support of your answer.

#Lag 4 to lag 1
par(mfrow = c(2,2))
plot(DeptStore.TS/1000, ylab = "Sales Dollars (in thousands)", xlab = "Year", bty = "l", main = "Dept Store Sales")
plot(diff(DeptStore.TS, lag=4),  ylab = "Lag-4", xlab = "Year", bty = "l", main = "Lag-4 Difference")
plot(diff(DeptStore.TS,lag=1), ylab = "Lag-1", xlab = "Year", bty = "l", main = "Lag-1 Difference")
Lag4Lag1<- diff(diff(DeptStore.TS, lag=4), lag=1)
plot(Lag4Lag1, ylab = "Lag-4, then Lag-1", xlab = "Year", bty = "l", main = "Twice-Differenced (Lag-4, Lag-1)")

#Lag 1 to lag 4
par(mfrow = c(2,2))
plot(DeptStore.TS/1000, ylab = "Sales Dollas (in thousands)", xlab = "Year", bty = "l", main = "Dept Store Sales")
plot(diff(DeptStore.TS, lag=1),  ylab = "Lag-1", xlab = "Year", bty = "l", main = "Lag-1 Difference")
plot(diff(DeptStore.TS, lag=4),  ylab = "Lag-4", xlab = "Year", bty = "l", main = "Lag-4 Difference")
Lag1Lag4<- diff(diff(DeptStore.TS, lag=1), lag=4)
plot(Lag1Lag4, ylab = "Lag-1, then Lag-4", xlab = "Year", bty = "l", main = "Twice-Differenced (Lag-1, Lag-4)")

The resulting twice-differenced charts appear to be the same. This would indicate that there is no preferred method, as they both yield the same results.

(E)

Forecast quarters 21-22 using the average of the double-differenced series from (d). Remember to use only the training period (until quarter 20), and to adjust back for the trend and seasonal pattern.

#Double Difference for Q21-22
DDavg.pred<- meanf(diff(diff(DeptTrain.TS, lag = 4), lag = 1), h=2)
DDavg.pred
##      Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## 6 Q1          569.2 -2116.935 3255.335 -3714.114 4852.514
## 6 Q2          569.2 -2116.935 3255.335 -3714.114 4852.514
#Empty Vector

realForecasts <- vector()

for (i in 1:DeptStoreValidLength) {
  if(i == 1) {
    realForecasts[i] <- DDavg.pred$mean[i] + DeptTrain.TS[(DeptStoreTrainLength+i)-DeptStoreValidLength] + (DeptTrain.TS[DeptStoreTrainLength] - DeptTrain.TS[DeptStoreTrainLength - DeptStoreValidLength])
  } else {
    realForecasts[i] <- DDavg.pred$mean[i] + DeptTrain.TS[(DeptStoreTrainLength+i)-DeptStoreValidLength] + (realForecasts[i-1] - DeptTrain.TS[DeptStoreTrainLength+i-1-DeptStoreValidLength])
  }
}

#Forecast for Q21 and Q22:
realForecasts
## [1] 63982.2 68177.4      NA      NA

(F)

Compare the forecasts from (e) to the exponential smoothing forecasts found in (b). Which of the two forecasting methods would you choose? Explain.

accuracy(realForecasts, DeptValid.TS)
##               ME     RMSE    MAE       MPE     MAPE ACF1 Theil's U
## Test set -3229.8 3230.151 3229.8 -5.141902 5.141902 -0.5        NA

I would opt for the Holt-Winter’s method. The MAPE on the double-difference method is above 5%, whereas the MAPE for Q21 and 22 using the Holt-Winter’s method were 2.16 and 0.73 respectively. The model also had fewer steps and more options, such as changing the alpha, beta, etc.

(G)

What is an even simpler approach that should be compared as a baseline? Complete that comparison.

#naive
DeptSnaive <- snaive(DeptTrain.TS,h=DeptStoreValidLength)
DeptSnaive
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 6 Q1          56405 51434.14 61375.86 48802.72 64007.28
## 6 Q2          60031 55060.14 65001.86 52428.72 67633.28
## 6 Q3          71486 66515.14 76456.86 63883.72 79088.28
## 6 Q4          92183 87212.14 97153.86 84580.72 99785.28
#plot comparison of methods
plot(c(1,7),yrange,type="n",xlab="Year",ylab="Sales Dollars (in thousands)",bty="l",xaxt="n",yaxt="n", main = "Comparison of Methods")
axis(1, at=seq(1,7,1), labels=format(seq(1,7,1)))
axis(2, at=seq(0,110000,10000), labels=format(seq(0,110,10)), las=2)
lines(DeptStore.TS,bty="l")
lines(DeptSnaive$mean, col="red", lwd=2, lty=2)
lines(Hwin.pred$mean,lwd=2,col="blue",lty=2)
legend(2,100000, c("Actual","Holt-Winter's", "Seasonal Naive Forecast"),lwd=2, lty=c(1,2,3), col=c("black", "blue","red"), bty="n")

#Accuracy of Snaive
accuracy(DeptSnaive, DeptValid.TS)
##                   ME     RMSE     MAE      MPE     MAPE     MASE
## Training set 2925.25 3878.784 3116.50 4.344358 4.737739 1.000000
## Test set     6482.25 7032.176 6482.25 8.170540 8.170540 2.079978
##                    ACF1 Theil's U
## Training set 0.54855812        NA
## Test set     0.01334402 0.4705349

A simpler approach would probably include looking at the naive forecast. It is easy to do, and uses the most recent data for the previous season. In the results above we see a couple things. First, that the Holt-Winter’s model is closer to the actual results than the Snaive forecast. Further, the naive forecast is decent, but underpredicting all values in the forecast - this is caused by the trend year over year (the naive forecast is only using the past 1 year of data and cannot determine the year over year trend). Lastly, the MAPE of the Naive forecast is just over 8% for the test set - this is a good result as a gut check, though not as good as the Holt-Winter’s method performed.

Question 8

Forecasting Australian Wine Sales: Figure 5.14 shows time plots of monthly sales of six types of Australian wines (red, rose, sweet white, dry white, sparkling, and fortified) for 1980-1994. Data available in AustralianWines.xls. 23 The units are thousands of liters. You are hired to obtain short-term forecasts (2-3 months ahead) for each of the six series, and this task will be repeated every month.

(A)

Which smoothing method would you choose if you had to choose the same method for forecasting all series? Why?

All of the 6 types of wines have some pattern of seasonality to them, and most have some trend. As such, we would need to use a method that can accomodate both of these conditions, leading me to choose the Holt-Winter’s method

(B)

Fortified wine has the largest market share of the six types of wine. You are asked to focus on fortified wine sales alone and produce as accurate a forecast as possible for the next two months.

Start by partitioning the data using the period until Dec- 1993 as the training period.

#import dataset
WineSales <- read.csv("AustralianWines.csv", stringsAsFactors = FALSE)

#Review data - need to omit NA values
WineSales <- na.omit(WineSales)

#convert to TS
WineSales.TS <- ts(WineSales$Fortified, start = c(1980, 1), frequency = 12)
#partition data
WineValidLength <- 12
WineTrainLength <- length(WineSales.TS) - WineValidLength
WineSalesTrain <- window(WineSales.TS,end = c(1980, WineTrainLength))
WineSalesValid <- window(WineSales.TS, start = c(1980, WineTrainLength + 1), end = c(1980, WineTrainLength + WineValidLength))

Apply Holt-Winter’s exponential smoothing (with multiplicative seasonality) to sales.

#apply Holt_winter's (use ZZM model)
HW.Wine.ets<- ets(WineSalesTrain, model = "ZZM", restrict = FALSE)
HW.Wine.ets
## ETS(M,A,M) 
## 
## Call:
##  ets(y = WineSalesTrain, model = "ZZM", restrict = FALSE) 
## 
##   Smoothing parameters:
##     alpha = 0.0555 
##     beta  = 9e-04 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 4040.0811 
##     b = -6.7983 
##     s=1.1316 1.0399 0.8877 0.9505 1.2722 1.3862
##            1.1463 1.1097 0.9345 0.8513 0.6996 0.5903
## 
##   sigma:  0.0859
## 
##      AIC     AICc      BIC 
## 2755.038 2759.118 2808.145
#Forecast
HWWine.predict<- forecast(HW.Wine.ets, h = WineValidLength + 2)
HWWine.predict$mean[13:14]
## [1] 1217.250 1435.458

Using this forecast January and February 1995 sales are expected to be 1217 and 1435 (in thousands of liters) respectively.

#Plot validation & prediction months
plot(HWWine.predict, ylab = "Sales in thousands of liters")

#Accuracy
accuracy(HWWine.predict, WineSalesValid)
##                     ME     RMSE      MAE       MPE      MAPE      MASE
## Training set -25.32466 287.8687 224.6507 -1.317643  7.229271 0.8073515
## Test set     125.56906 328.9246 256.3940  4.443793 10.858860 0.9214307
##                     ACF1 Theil's U
## Training set  0.05168201        NA
## Test set     -0.01105575 0.7140459

The validation data has a MAPE of 10.86%.

(C)

Create a plot for the residuals from the Holt-Winter’s exponential smoothing.

Plot below.

#Residuals for Fortified Wine Sales
plot(HW.Wine.ets$residuals, xlab = "Year", ylab = "Residuals (in thousands of liters)", bty = "l", xaxt = "n", yaxt = "n", lwd = 1, main = "Residuals of Fortified Wine Sales", bty = "n", col = "blue")
axis(1,at=seq(1980,1994,1), labels = format(seq(1980,1994,1)))
axis(2,at=seq(-.5,.5,.05),labels = format(seq(-5000,5000,500)),las = 2)

i. Based on this plot, which of the following statements are reasonable?

Decembers (month 12) are not captured well by the model.

Reasonable. Residuals tend to be very high in December months.

There is a strong correlation between sales on the same calendar month.

Somewhat Reasonable. The actual and forecast models show the seasonality better than can be explained by the residual plot, but residuals are closer to zero (model closer to actual) in some periods, such as Q3 each year, indicating the correlation by calendar month.

The model does not capture the seasonality well.

Unreasonable. This model is designed to do just that - further, the residuals are off in opposite directions at times (like Decembers) indicating that sales can vary is either direction, and seasonality is not to blame (or is inconsistant year over year)

We should first deseasonalize the data and then apply Holt-Winter’s exponential smoothing.

Unreasonable. Again, this Holt-Winter’s method is designed to handle seasonality internally. And, as is true above, the seasonality is not what frequently resulted in poor results.

ii. How can you handle the above effect with exponential smoothing?

A Double-seasonal Holt-Winter’s model could be used to adapt to secondary seasons - such as annual seasonality. In wine terms, this could be due to the vintage, as some are better than others. However, I’m not sure how predictable that seasonality may be. Another option would be to change the constants to allow for more flexibility of the forecast as it progresses.