Assignment

Due Monday, February 26th, 2018 at 11:59PM: Problems 2, 5 and 8 from Chapter 5 of Shmueli




Relationship between Moving Average and Exponential Smoothing

Assume that we apply a moving average to a series, using a very short window span.



(a) If we wanted to achieve an equivalent result using simple exponential smoothing, what value should the smoothing constant take?

To mimic an equivalent result from the moving average but using simple exponential smoothing, the smoothing constant would require a higher value — closer to 0.80. The reason is that in applying a moving average to a series with an incredibly short window span, the resulting forecast will show local trends. The greater alpha value means that more weight is placed on the most recent data, rather than a lower alpha value which would weight all data points more equally.













Forecasting Department Store Sales

The file DeptStoreSales.csv contains data on the quarterly sales for a department store over a 6-year period.



(a) Which of the following methods would not be suitable for forecasting this series? Explain why or why not for each one.

- Moving average of a raw series
The moving average would not be suitable for the raw series as it is clear from the time plot provided in the textbook that there is seasonality, as well as what appears to be a slight trend with the data. While it may be implemented, the results from the moving average would prove to be unreliable as it is best suited for data with no trend or seasonality.

- Moving average of deseasonalized series
Despite deseasonalizing the series, the moving average would still return an unreliable forecast as it must contend with the trend that it is evident. Therefore, unless the data series was also detrended, the moving average of a deseasonalized series would not be suitable for forecasting this series.

- Simple exponential smoothing of the raw series
This method is not suited for forecasting this series. To use simple exponential smoothing to the raw series, one would have to deseasonalize and detrend the data first as this method is better suited for data with no seasonality and no trend.

- Double exponential smoothing of the raw series
The Holt, or double exponential smoothing, method would not be suited for this data. While it may accommodate the trend that exists within the data, it can not account for the seasonality. So, in order to utilize the Holt method, the data would need to be deseasonalized beforehand.

- Holt-Winters’ exponential smoothing of the raw series
Because the Holt-Winters’, or triple exponential smoothing, method is able to handle both trend and seasaonlity it is therefore best suited for forecasting the department store sales data.



(b) A forecaster was tasked to generate forecasts for 4 quarters ahead. She therefore partitioned the data so that the last 4 quarters were designated as the validation period. The forecaster approached the forecasting task by using multiplicative Holt-Winter’s exponential smoothing. Specifically, you should call the hw function with the parameter seasonal=“multiplicative”. Let the method pick the appropriate parameters for α, β, and γ.


i. Run this method on the data. Request the forecasts on the validation period. (Note that the forecasted values for the validation set will be different than what the book shows.)

First lets take a look at the plotted data.

#Upload of the DeptStoreSales.csv file that was then partitioned for Training and Validation Periods
dept_sales <- read.csv("DeptStoreSales.csv")

#Divide Sales by 1,000 to increase autoplot legibility
dept_sales <- mutate(dept_sales, sales = (Sales/1000))
dept_sales <- select(dept_sales, "sales")

#Create time series
dept_ts <- ts(dept_sales[,1], start = c(1,1), frequency = 4)

#Plot Department Store Sales
yrange = range(dept_ts)

plot(c(1, 7), c(0, yrange[2]+10), type="n", main="Department Store Sales by Quarter", xlab="Year", ylab="$ Sales (000s)", xaxt="n", yaxt="n")
lines(dept_ts, lwd=2)
axis(1, at=seq(1, 7, 1), labels=format(seq(1,7,1)))
axis(2, at=seq(0, 110, 10),las=2)

#Establish the Training and Validation periods for dept_ts
nValid <- 4
nTrain <- length(dept_ts) - nValid

deptTRAIN <- window(dept_ts, start = c(1,1), end = c(1, nTrain))
deptVALID <- window(dept_ts, start = c(1, nTrain + 1), end = c(1, nTrain + nValid))

#Replot Department Store Sales with line distinctions for Training and Validation periods
plot(c(1, 7), c(0, yrange[2]+10), type="n", main="Department Store Sales by Quarter", xlab="Year", ylab="$ Sales (000s)", xaxt="n", yaxt="n")
lines(dept_ts, lwd=2)
axis(1, at=seq(1, 7, 1), labels=format(seq(1,7,1)))
axis(2, at=seq(0, 110, 10),las=2)
abline(v=6)
arrows(1, 105, 6, 105, code=3, length=0.1)
text(3, 108, "Training")
abline(v=7)
arrows(6, 105, 7, 105, code=3, length=0.1)
text(6.5, 108, "Validation")

#Display Training and Validation Period data
pander(deptTRAIN, caption = "Training Period", split.table = Inf, row.names = paste('Year', 1:5))
Training Period
  Q1 Q2 Q3 Q4
Year 1 50.15 49.33 57.05 76.78
Year 2 48.62 50.9 58.52 77.69
Year 3 50.86 53.03 58.85 79.66
Year 4 51.64 54.12 65.68 85.17
Year 5 56.41 60.03 71.49 92.18
pander(deptVALID, caption = "Validation Period", split.table = Inf, row.names = "Year 6")
Validation Period
  Q1 Q2 Q3 Q4
Year 6 60.8 64.9 77 103.3
#Call the hw() method
deptHWmult <- hw(deptTRAIN, seasonal="multiplicative", h=4)
autoplot(deptTRAIN, main = "Holt-Winters Forecast on Department Store Sales by Quarter", ylim = c(0,110)) + xlab("Year") + ylab("$ Sales (000s)") + autolayer(deptHWmult, PI=FALSE, series="Holt-Winters Multiplicative") + autolayer(deptVALID, series="Validation Data")

summary(deptHWmult)
## 
## Forecast method: Holt-Winters' multiplicative method
## 
## Model Information:
## Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = deptTRAIN, h = 4, seasonal = "multiplicative") 
## 
##   Smoothing parameters:
##     alpha = 0.3845 
##     beta  = 0.155 
##     gamma = 0.4869 
## 
##   Initial states:
##     l = 58.0694 
##     b = 0.0504 
##     s=1.3101 0.9798 0.857 0.8531
## 
##   sigma:  0.0252
## 
##       AIC      AICc       BIC 
##  95.02871 113.02871 103.99030 
## 
## Error measures:
##                     ME     RMSE       MAE       MPE     MAPE     MASE
## Training set 0.3902652 1.470429 0.9580152 0.6432333 1.611164 0.307401
##                    ACF1
## Training set -0.1212042
## 
## Forecasts:
##      Point Forecast    Lo 80     Hi 80    Lo 95     Hi 95
## 6 Q1       61.34083 59.35612  63.32554 58.30547  64.37619
## 6 Q2       65.01693 62.63655  67.39732 61.37645  68.65742
## 6 Q3       76.88570 73.61530  80.15610 71.88405  81.88735
## 6 Q4       99.53246 94.56957 104.49534 91.94238 107.12253


ii. Using the forecasts for the validation set that you came up with in i. above, compute the MAPE values for the forecasts of quarters 21-22.

#####Table 1.1 - Table w/ MAPE value for Q21#####

#Computing the MAPE values for Q21 from the Validation set
accuracy(deptHWmult, deptVALID, test = 1)
##                  ME      RMSE       MAE        MPE      MAPE      MASE
## Test set -0.5408292 0.5408292 0.5408292 -0.8895218 0.8895218 0.1735374
##          ACF1 Theil's U
## Test set   NA       NaN
Table 1.1 - Table w/ MAPE value for Q22
#Computing the MAPE value for Q22 from the Validation set
accuracy(deptHWmult, deptVALID, test = 2)
##                  ME      RMSE       MAE       MPE     MAPE       MASE ACF1
## Test set -0.1169336 0.1169336 0.1169336 -0.180175 0.180175 0.03752081   NA
##           Theil's U
## Test set 0.02852039



(c) The fit and the residuals were displayed in the book. Please reproduce them with R code. Using all the information from (b) and your generated figures, is this model suitable for forecasting quarters 21 and 22?

#Reproduce the fit, as displayed in the book
autoplot(deptTRAIN, main = "Exp Smoothing: Actual Vs Forecast", series="Actual", ylim = c(0,110)) + xlab("Year") + ylab("$ Sales (000s)") + autolayer(deptHWmult$fitted, series="Forecast")

#Reproduce the residuals, as displayed in the book
autoplot(resid(deptHWmult), main = "Exp Smoothing Forecast Errors", xlab = "Year", ylab = "Forecast Error")

This model proves to be suitable for forecasting Q21 and Q22 based on several factors, including the low MAPE values and the model’s strong ability to fit to the data set.



(d) Another analyst decided to take a much simpler approach, and instead of using exponential smoothing he used differencing. Use differencing to remove the trend and seasonal pattern. Which order works better: first removing trend and then seasonality or the the opposite order? Show the progression of time plots as you difference the data and each final series to provide evidence in support of your answer.###

Part 1. Remove the trend and seasonal pattern.

dept_dtrend1 <- diff(dept_ts, lag=1)
dept_dseas1 <- diff(dept_dtrend1, lag=4)

par(mfrow = c(2,1))
plot(dept_dtrend1, main = "Department Store Sales by Quarter, Detrended", xlab="Year", ylab=NULL, yaxt='n') 
plot(dept_dseas1, main = "Department Store Sales by Quarter, Detrended THEN Deseasonalized", xlab="Year", ylab=NULL, yaxt='n') 

Part 2. Remove seasonal pattern and then the trend.

dept_dseas2 <- diff(dept_ts, lag=4)
dept_dtrend2 <- diff(dept_dseas2, lag=1)

par(mfrow = c(2,1))
plot(dept_dseas2, main = "Department Store Sales by Quarter, Deseasonalized", xlab="Year", ylab=NULL, yaxt='n')
plot(dept_dtrend2, main = "Department Store Sales by Quarter, Deseasonalized THEN Detrended", xlab="Year", ylab=NULL, yaxt='n') 

Part 3. Comparison of final products, as evidence to support claim.
Whether you deseasonalize first and then detrend, or vice versa, the end product returns similar results, as shown below following the processes from both Parts 1 & 2 above.

par(mfrow = c(2,1))
plot(dept_dseas1, main = "Department Store Sales by Quarter, Detrended THEN Deseasonalized", xlab="Year", ylab=NULL, yaxt='n') 
plot(dept_dtrend2, main = "Department Store Sales by Quarter, Deseasonalized THEN Detrended", xlab="Year", ylab=NULL, yaxt='n') 



(e) Forecast quarters 21-22 using the average of the double-differenced series from (d). Remember to use only the training period (until quarter 20), and to adjust back for the trend and seasonal pattern.**

# The meanf() function will return a point forecast (and prediction interval) for the stationary process. H=2 represents the parameter to forecast out 2 periods, Q21 and Q22.
pointForecasts <- meanf(diff(diff(deptTRAIN, lag=4), lag=1), h=2)

# Set up an empty vector
realForecasts <- vector()

for (i in 1:nValid) {
  if(i == 1) {
    realForecasts[i] <- pointForecasts$mean[i] + deptTRAIN[(nTrain+i)-nValid] + (deptTRAIN[nTrain] - deptTRAIN[nTrain - nValid])
  } else {
    realForecasts[i] <- pointForecasts$mean[i] + deptTRAIN[(nTrain+i)-nValid] + (realForecasts[i-1] - deptTRAIN[nTrain+i-1-nValid])
  }
}

# See what they look like
realForecasts
## [1] 63.9822 68.1774      NA      NA



(f) Compare the forecasts from (e) to the exponential smoothing forecasts found in (b). Which of the two forecasting methods would you choose? Explain.**

From Holt-Winters’:

accuracy(deptHWmult, deptVALID, test=1:2)
##                  ME      RMSE       MAE        MPE      MAPE      MASE
## Test set -0.3288814 0.3912606 0.3288814 -0.5348484 0.5348484 0.1055291
##          ACF1  Theil's U
## Test set -0.5 0.02852039

From Double-Differencing:

accuracy(realForecasts, deptVALID)
##               ME     RMSE    MAE       MPE     MAPE ACF1 Theil's U
## Test set -3.2298 3.230151 3.2298 -5.141902 5.141902 -0.5   0.13634

Based on the lower MAPE scores for the Holt-Winters’ model, when compared to those of Double-Differencing, (along with the fit results of plotting against the Validation set) I would choose the Holt-Winters’ forecasting. Not only was it easier to compute, it was also easier to immediately interpret and understand, and easy to visualize.



(g) What is an even simpler approach that should be compared as a baseline? Complete that comparison.**

dept_snaive <- snaive(deptTRAIN,h=nValid)

plot(c(1, 7), c(0, yrange[2]+10), type="n", main="Department Store Sales by Quarter", xlab="Year", ylab="$ Sales (000s)", xaxt="n", yaxt="n")
lines(dept_ts, lwd=2)
axis(1, at=seq(1, 7, 1), labels=format(seq(1,7,1)))
axis(2, at=seq(0, 110, 10),las=2)
lines(deptHWmult$mean,col="red", lwd=2)
lines(dept_snaive$mean, col="orange", lwd=2)
legend("bottomright", c("Actual","Holt-Winters'", "Seasonal Naive"),bty="n", lwd=2, col=c("black", "red","orange"))

An even simpler approach would be looking at the seasonal naive forecast method. It is clear that there is seasonality to the data, which would lend itself well to the ease of running the seasonal naive. However, while it may be simpler, it is important to note from the plotted chart above that the Seasonal Naive is consistently underforecasting all of Year 6. In this particular instance, the Holt-Winters’ model fits the Validation period much better. However, one should also look at the MAPE values to confirm/deny the assumption. (side note: based on the MAPE values below, the Holt-Winter can be confirmed as a better model)

From Seasonal Naive:

accuracy(dept_snaive, deptVALID)
##                   ME     RMSE     MAE      MPE     MAPE     MASE
## Training set 2.92525 3.878784 3.11650 4.344358 4.737739 1.000000
## Test set     6.48225 7.032176 6.48225 8.170540 8.170540 2.079978
##                    ACF1 Theil's U
## Training set 0.54855812        NA
## Test set     0.01334402 0.4705349

From Holt-Winters’:

accuracy(deptHWmult, deptVALID)
##                     ME     RMSE       MAE       MPE     MAPE      MASE
## Training set 0.3902652 1.470429 0.9580152 0.6432333 1.611164 0.3074010
## Test set     0.8145199 1.923091 1.1434013 0.6891346 1.223983 0.3668864
##                     ACF1 Theil's U
## Training set -0.12120425        NA
## Test set     -0.01525507 0.1251457













Forecasting Australian Wine Sales

You are hired to obtain short-term forecasts (2-3 months ahead) for each of the six series (red, rose, sweet white, dry white, sparkling, and fortified), and this task will be repeated every month.



(a) Which smoothing method would you choose if you had to choose the same method for forecasting all series? Why?

On first glance it is apparent that all six series have some seasonality to them. Some of the series have distinct trends to them (including rose, red, sparkling, dry white, and fortified), while the sweet wine has less distinct seasonality … particularly up until (what appears to be a massive surge in consumption in) 1986. Given that we must select a forecasting method for all six series and it must be repeated on a monthly basis, then I would be forced to select a method that can accommodate seasonality and trend. This would rule out Moving Averages and Simple Exponential Smoothing, as well as the Holt method — which does not lend itself well to account for seasonality.



(b) Fortified wine has the largest market share of the six types of wine. You are asked to focus on fortified wine sales alone and produce as accurate a forecast as possible for the next two months.

#Upload of the AustralianWines.csv file that was then partitioned for Training and Validation Periods
aus_data <- read.csv("AustralianWines.csv")

#Create time series
aus_wine <- ts(aus_data$Fortified, start = c(1980,1), frequency = 12)

#Establish the Training and Validation periods for aus_wines
nValid <- 4
nTrain <- length(aus_wine) - nValid

wineTRAIN <- window(aus_wine, start = c(1980,1), end = c(1993, 12))
wineVALID <- window(aus_wine, start = c(1994, 1), end = c(1994, 12))

#Plot Fortified Wine Sales with line distinctions for Training and Validation periods
yrange = range(aus_wine)

plot(c(1980, 1995), c(0, 7000), type="n", main="Sales of Fortified Wine in Australia from 1980 to 1995", xlab="Year", ylab="Liters (000s)", xaxt="n", yaxt="n")
lines(aus_wine, lwd=1, col="black")
axis(1, at=seq(1980, 1995, 1), labels=format(seq(1980, 1995, 1)))
axis(2, at=seq(0, 6000, 1000),las=2)
abline(v=1994)
arrows(1980, 6250, 1994, 6250, code=3, length=0.1)
text(1986.5, 6500, "T")
abline(v=1995)
arrows(1994, 6250, 1995, 6250, code=3, length=0.1)
text(1994.5, 6500, "V")

#Display Training and Validation Period data
pander(wineTRAIN, caption = "Training Period", split.table = Inf)
Training Period
  Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1980 2585 3368 3210 3111 3756 4216 5225 4426 3932 3816 3661 3795
1981 2285 2934 2985 3646 4198 4935 5618 5454 3624 2898 3802 3629
1982 2369 2511 3079 3728 4151 4326 5054 5138 3310 3508 3790 3446
1983 2127 2523 3017 3265 3822 4027 4420 5255 4009 3074 3465 3718
1984 1954 2604 3626 2836 4042 3584 4225 4523 2892 2876 3420 3159
1985 2101 2181 2724 2954 4092 3470 3990 4239 2855 2897 3433 3307
1986 1914 2214 2320 2714 3633 3295 4377 4442 2774 2840 2828 3758
1987 1610 1968 2248 3262 3164 2972 4041 3402 2898 2555 3056 3717
1988 1755 2193 2198 2777 3076 3389 4231 3118 2524 2280 2862 3502
1989 1558 1940 2226 2676 3145 3224 4117 3446 2482 2349 2986 3163
1990 1651 1725 2622 2316 2976 3263 3951 2917 2380 2458 2883 2579
1991 1330 1686 2457 2514 2834 2757 3425 3006 2369 2017 2507 3168
1992 1545 1643 2112 2415 2862 2822 3260 2606 2264 2250 2545 2856
1993 1208 1412 1964 2018 2329 2660 2923 2626 2132 1772 2526 2755
pander(wineVALID, caption = "Validation Period", split.table = Inf)
Validation Period
  Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1994 1154 1568 1965 2659 2354 2592 2714 2294 2416 2016 2799 2467
Table 1.1 - Table w/ MAPE value for Holt-Winter Forecast
#Call the hw() method
wineHWmult <- hw(wineTRAIN, seasonal="multiplicative", h=12)

#Test accuracy as compared to the Validation Period
accuracy(wineHWmult, wineVALID)
##                     ME     RMSE      MAE       MPE      MAPE      MASE
## Training set -22.06217 280.3027 221.7110 -1.459040  7.260039 0.7967865
## Test set     115.04537 337.3427 265.2853  3.739657 11.246490 0.9533845
##                    ACF1 Theil's U
## Training set 0.03407816        NA
## Test set     0.01658673 0.7213012
Table 1.1 - Forecast for the next two months
wineHWmult$mean[1:2]
## [1] 1326.700 1566.896
plot(c(1980, 1995), c(0, 7000), type="n", main="Sales of Fortified Wine in Australia from 1980 to 1995", xlab="Year", ylab="Liters (000s)", xaxt="n", yaxt="n")
lines(aus_wine, lwd=1, col="black")
lines(wineHWmult$mean, lwd=2, col="purple")
axis(1, at=seq(1980, 1995, 1), labels=format(seq(1980, 1995, 1)))
axis(2, at=seq(0, 6000, 1000),las=2)
abline(v=1994)
arrows(1980, 6250, 1994, 6250, code=3, length=0.1)
text(1986.5, 6500, "T")
abline(v=1995)
arrows(1994, 6250, 1995, 6250, code=3, length=0.1)
text(1994.5, 6500, "V")
legend("bottomleft", c("Actual","Holt-Winters'"),bty="n", lwd=2, col=c("black", "purple"))



(c) Create a plot for the residuals from the Holt-Winters’ exponential smoothing.

#Plot the residuals from HW
autoplot(resid(wineHWmult), main = "Residuals from Holt-Winters' multiplicative method", xlab = "Year", ylab = "Forecast Error")

checkresiduals(wineHWmult)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' multiplicative method
## Q* = 39.543, df = 8, p-value = 3.897e-06
## 
## Model df: 16.   Total lags used: 24

i. Based on this plot, which of the following statements are reasonable?
- Decembers (month 12) are not captured well by the model.
This statement is reasonable based on the drastic fluctuation on December residuals. - There is a strong correlation between sales on the same calendar month.
While there is seasonality to sales of Fortified Wine, I’m hesitant to declare that this is entirely reasonable. Considering the flucuations in residuals, it’s a little unclear. Especially given that there are some months that have a strong correlation between them. - The model does not capture the seasonality well
This statement doesn’t feel too reasonable. While perhaps not the most effective at capturing seasonality based on what we’ve seen in examples past, the Holt-Winters’ method is intended to be able to capture both trend and seasonality. Both of which are present in the Fortified Wine data set. - We should first deseasonalize the data and then apply Holt-Winters’ exponential smoothing.
Because Holt-Winters’ is equipped to capture both trend and seasonality then this statement feels unreasonable.


ii. How can you handle the above effect with exponential smoothing?
We are looking at month to month seasonality, but I wonder whether perhaps we extend that seasonality to explore annual seasonality. Given the impact that external factors may have on wine (i.e., temperature, vineyard conditions, etc.) it feels that an annual batch may skew the data set and potentially provide misleading results. I wonder how many times these participating Australian vineyards heard that their 1991 was a bottle of crap?