Assignment

Due Monday, February 26th, 2018 at 11:59PM: Problems 2, 5 and 8 from Chapter 5 of Shmueli

Relationship between Moving Average and Exponential Smoothing

Assume that we apply a moving average to a series, using a very short window span.

(a) If we wanted to achieve an equivalent result using simple exponential smoothing, what value should the smoothing constant take?

To mimic an equivalent result from the moving average but using simple exponential smoothing, the smoothing constant would require a higher value — closer to 0.80. The reason is that in applying a moving average to a series with an incredibly short window span, the resulting forecast will show local trends. The greater alpha value means that more weight is placed on the most recent data, rather than a lower alpha value which would weight all data points more equally.

Forecasting Department Store Sales

The file DeptStoreSales.csv contains data on the quarterly sales for a department store over a 6-year period.

(a) Which of the following methods would not be suitable for forecasting this series? Explain why or why not for each one.

- Moving average of a raw series
The moving average would not be suitable for the raw series as it is clear from the time plot provided in the textbook that there is seasonality, as well as what appears to be a slight trend with the data. While it may be implemented, the results from the moving average would prove to be unreliable as it is best suited for data with no trend or seasonality.

- Moving average of deseasonalized series
Despite deseasonalizing the series, the moving average would still return an unreliable forecast as it must contend with the trend that it is evident. Therefore, unless the data series was also detrended, the moving average of a deseasonalized series would not be suitable for forecasting this series.

- Simple exponential smoothing of the raw series
This method is not suited for forecasting this series. To use simple exponential smoothing to the raw series, one would have to deseasonalize and detrend the data first as this method is better suited for data with no seasonality and no trend.

- Double exponential smoothing of the raw series
The Holt, or double exponential smoothing, method would not be suited for this data. While it may accommodate the trend that exists within the data, it can not account for the seasonality. So, in order to utilize the Holt method, the data would need to be deseasonalized beforehand.

- Holt-Winters’ exponential smoothing of the raw series
Because the Holt-Winters’, or triple exponential smoothing, method is able to handle both trend and seasaonlity it is therefore best suited for forecasting the department store sales data.

(b) A forecaster was tasked to generate forecasts for 4 quarters ahead. She therefore partitioned the data so that the last 4 quarters were designated as the validation period. The forecaster approached the forecasting task by using multiplicative Holt-Winter’s exponential smoothing. Specifically, you should call the hw function with the parameter seasonal=“multiplicative”. Let the method pick the appropriate parameters for α, β, and γ.

i. Run this method on the data. Request the forecasts on the validation period. (Note that the forecasted values for the validation set will be different than what the book shows.)

First lets take a look at the plotted data.

#Upload of the DeptStoreSales.csv file that was then partitioned for Training and Validation Periods
dept_sales <- read.csv("DeptStoreSales.csv")

#Divide Sales by 1,000 to increase autoplot legibility
dept_sales <- mutate(dept_sales, sales = (Sales/1000))
dept_sales <- select(dept_sales, "sales")

#Create time series
dept_ts <- ts(dept_sales[,1], start = c(1,1), frequency = 4)

#Plot Department Store Sales
yrange = range(dept_ts)

plot(c(1, 7), c(0, yrange[2]+10), type="n", main="Department Store Sales by Quarter", xlab="Year", ylab="$ Sales (000s)", xaxt="n", yaxt="n")
lines(dept_ts, lwd=2)
axis(1, at=seq(1, 7, 1), labels=format(seq(1,7,1)))
axis(2, at=seq(0, 110, 10),las=2)

#Establish the Training and Validation periods for dept_ts
nValid <- 4
nTrain <- length(dept_ts) - nValid

deptTRAIN <- window(dept_ts, start = c(1,1), end = c(1, nTrain))
deptVALID <- window(dept_ts, start = c(1, nTrain + 1), end = c(1, nTrain + nValid))

#Replot Department Store Sales with line distinctions for Training and Validation periods
plot(c(1, 7), c(0, yrange[2]+10), type="n", main="Department Store Sales by Quarter", xlab="Year", ylab="$ Sales (000s)", xaxt="n", yaxt="n")
lines(dept_ts, lwd=2)
axis(1, at=seq(1, 7, 1), labels=format(seq(1,7,1)))
axis(2, at=seq(0, 110, 10),las=2)
abline(v=6)
arrows(1, 105, 6, 105, code=3, length=0.1)
text(3, 108, "Training")
abline(v=7)
arrows(6, 105, 7, 105, code=3, length=0.1)
text(6.5, 108, "Validation")

#Display Training and Validation Period data
pander(deptTRAIN, caption = "Training Period", split.table = Inf, row.names = paste('Year', 1:5))

Training Period
	Q1	Q2	Q3	Q4
Year 1	50.15	49.33	57.05	76.78
Year 2	48.62	50.9	58.52	77.69
Year 3	50.86	53.03	58.85	79.66
Year 4	51.64	54.12	65.68	85.17
Year 5	56.41	60.03	71.49	92.18

pander(deptVALID, caption = "Validation Period", split.table = Inf, row.names = "Year 6")

Validation Period
	Q1	Q2	Q3	Q4
Year 6	60.8	64.9	77	103.3

#Call the hw() method
deptHWmult <- hw(deptTRAIN, seasonal="multiplicative", h=4)
autoplot(deptTRAIN, main = "Holt-Winters Forecast on Department Store Sales by Quarter", ylim = c(0,110)) + xlab("Year") + ylab("$ Sales (000s)") + autolayer(deptHWmult, PI=FALSE, series="Holt-Winters Multiplicative") + autolayer(deptVALID, series="Validation Data")

summary(deptHWmult)

## 
## Forecast method: Holt-Winters' multiplicative method
## 
## Model Information:
## Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = deptTRAIN, h = 4, seasonal = "multiplicative") 
## 
##   Smoothing parameters:
##     alpha = 0.3845 
##     beta  = 0.155 
##     gamma = 0.4869 
## 
##   Initial states:
##     l = 58.0694 
##     b = 0.0504 
##     s=1.3101 0.9798 0.857 0.8531
## 
##   sigma:  0.0252
## 
##       AIC      AICc       BIC 
##  95.02871 113.02871 103.99030 
## 
## Error measures:
##                     ME     RMSE       MAE       MPE     MAPE     MASE
## Training set 0.3902652 1.470429 0.9580152 0.6432333 1.611164 0.307401
##                    ACF1
## Training set -0.1212042
## 
## Forecasts:
##      Point Forecast    Lo 80     Hi 80    Lo 95     Hi 95
## 6 Q1       61.34083 59.35612  63.32554 58.30547  64.37619
## 6 Q2       65.01693 62.63655  67.39732 61.37645  68.65742
## 6 Q3       76.88570 73.61530  80.15610 71.88405  81.88735
## 6 Q4       99.53246 94.56957 104.49534 91.94238 107.12253

ii. Using the forecasts for the validation set that you came up with in i. above, compute the MAPE values for the forecasts of quarters 21-22.

#####Table 1.1 - Table w/ MAPE value for Q21#####

#Computing the MAPE values for Q21 from the Validation set
accuracy(deptHWmult, deptVALID, test = 1)

##                  ME      RMSE       MAE        MPE      MAPE      MASE
## Test set -0.5408292 0.5408292 0.5408292 -0.8895218 0.8895218 0.1735374
##          ACF1 Theil's U
## Test set   NA       NaN

Table 1.1 - Table w/ MAPE value for Q22

#Computing the MAPE value for Q22 from the Validation set
accuracy(deptHWmult, deptVALID, test = 2)

##                  ME      RMSE       MAE       MPE     MAPE       MASE ACF1
## Test set -0.1169336 0.1169336 0.1169336 -0.180175 0.180175 0.03752081   NA
##           Theil's U
## Test set 0.02852039

(c) The fit and the residuals were displayed in the book. Please reproduce them with R code. Using all the information from (b) and your generated figures, is this model suitable for forecasting quarters 21 and 22?

#Reproduce the fit, as displayed in the book
autoplot(deptTRAIN, main = "Exp Smoothing: Actual Vs Forecast", series="Actual", ylim = c(0,110)) + xlab("Year") + ylab("$ Sales (000s)") + autolayer(deptHWmult$fitted, series="Forecast")

#Reproduce the residuals, as displayed in the book
autoplot(resid(deptHWmult), main = "Exp Smoothing Forecast Errors", xlab = "Year", ylab = "Forecast Error")

This model proves to be suitable for forecasting Q21 and Q22 based on several factors, including the low MAPE values and the model’s strong ability to fit to the data set.

(d) Another analyst decided to take a much simpler approach, and instead of using exponential smoothing he used differencing. Use differencing to remove the trend and seasonal pattern. Which order works better: first removing trend and then seasonality or the the opposite order? Show the progression of time plots as you difference the data and each final series to provide evidence in support of your answer.###

Part 1. Remove the trend and seasonal pattern.

dept_dtrend1 <- diff(dept_ts, lag=1)
dept_dseas1 <- diff(dept_dtrend1, lag=4)

par(mfrow = c(2,1))
plot(dept_dtrend1, main = "Department Store Sales by Quarter, Detrended", xlab="Year", ylab=NULL, yaxt='n') 
plot(dept_dseas1, main = "Department Store Sales by Quarter, Detrended THEN Deseasonalized", xlab="Year", ylab=NULL, yaxt='n')

Part 2. Remove seasonal pattern and then the trend.

dept_dseas2 <- diff(dept_ts, lag=4)
dept_dtrend2 <- diff(dept_dseas2, lag=1)

par(mfrow = c(2,1))
plot(dept_dseas2, main = "Department Store Sales by Quarter, Deseasonalized", xlab="Year", ylab=NULL, yaxt='n')
plot(dept_dtrend2, main = "Department Store Sales by Quarter, Deseasonalized THEN Detrended", xlab="Year", ylab=NULL, yaxt='n')

Part 3. Comparison of final products, as evidence to support claim.
Whether you deseasonalize first and then detrend, or vice versa, the end product returns similar results, as shown below following the processes from both Parts 1 & 2 above.

par(mfrow = c(2,1))
plot(dept_dseas1, main = "Department Store Sales by Quarter, Detrended THEN Deseasonalized", xlab="Year", ylab=NULL, yaxt='n') 
plot(dept_dtrend2, main = "Department Store Sales by Quarter, Deseasonalized THEN Detrended", xlab="Year", ylab=NULL, yaxt='n')

(e) Forecast quarters 21-22 using the average of the double-differenced series from (d). Remember to use only the training period (until quarter 20), and to adjust back for the trend and seasonal pattern.**

# The meanf() function will return a point forecast (and prediction interval) for the stationary process. H=2 represents the parameter to forecast out 2 periods, Q21 and Q22.
pointForecasts <- meanf(diff(diff(deptTRAIN, lag=4), lag=1), h=2)

# Set up an empty vector
realForecasts <- vector()

for (i in 1:nValid) {
  if(i == 1) {
    realForecasts[i] <- pointForecasts$mean[i] + deptTRAIN[(nTrain+i)-nValid] + (deptTRAIN[nTrain] - deptTRAIN[nTrain - nValid])
  } else {
    realForecasts[i] <- pointForecasts$mean[i] + deptTRAIN[(nTrain+i)-nValid] + (realForecasts[i-1] - deptTRAIN[nTrain+i-1-nValid])
  }
}

# See what they look like
realForecasts

## [1] 63.9822 68.1774      NA      NA

(f) Compare the forecasts from (e) to the exponential smoothing forecasts found in (b). Which of the two forecasting methods would you choose? Explain.**

From Holt-Winters’:

accuracy(deptHWmult, deptVALID, test=1:2)

##                  ME      RMSE       MAE        MPE      MAPE      MASE
## Test set -0.3288814 0.3912606 0.3288814 -0.5348484 0.5348484 0.1055291
##          ACF1  Theil's U
## Test set -0.5 0.02852039

From Double-Differencing:

accuracy(realForecasts, deptVALID)

##               ME     RMSE    MAE       MPE     MAPE ACF1 Theil's U
## Test set -3.2298 3.230151 3.2298 -5.141902 5.141902 -0.5   0.13634

Based on the lower MAPE scores for the Holt-Winters’ model, when compared to those of Double-Differencing, (along with the fit results of plotting against the Validation set) I would choose the Holt-Winters’ forecasting. Not only was it easier to compute, it was also easier to immediately interpret and understand, and easy to visualize.

(g) What is an even simpler approach that should be compared as a baseline? Complete that comparison.**

dept_snaive <- snaive(deptTRAIN,h=nValid)

plot(c(1, 7), c(0, yrange[2]+10), type="n", main="Department Store Sales by Quarter", xlab="Year", ylab="$ Sales (000s)", xaxt="n", yaxt="n")
lines(dept_ts, lwd=2)
axis(1, at=seq(1, 7, 1), labels=format(seq(1,7,1)))
axis(2, at=seq(0, 110, 10),las=2)
lines(deptHWmult$mean,col="red", lwd=2)
lines(dept_snaive$mean, col="orange", lwd=2)
legend("bottomright", c("Actual","Holt-Winters'", "Seasonal Naive"),bty="n", lwd=2, col=c("black", "red","orange"))

An even simpler approach would be looking at the seasonal naive forecast method. It is clear that there is seasonality to the data, which would lend itself well to the ease of running the seasonal naive. However, while it may be simpler, it is important to note from the plotted chart above that the Seasonal Naive is consistently underforecasting all of Year 6. In this particular instance, the Holt-Winters’ model fits the Validation period much better. However, one should also look at the MAPE values to confirm/deny the assumption. (side note: based on the MAPE values below, the Holt-Winter can be confirmed as a better model)

From Seasonal Naive:

accuracy(dept_snaive, deptVALID)

##                   ME     RMSE     MAE      MPE     MAPE     MASE
## Training set 2.92525 3.878784 3.11650 4.344358 4.737739 1.000000
## Test set     6.48225 7.032176 6.48225 8.170540 8.170540 2.079978
##                    ACF1 Theil's U
## Training set 0.54855812        NA
## Test set     0.01334402 0.4705349

From Holt-Winters’:

accuracy(deptHWmult, deptVALID)

##                     ME     RMSE       MAE       MPE     MAPE      MASE
## Training set 0.3902652 1.470429 0.9580152 0.6432333 1.611164 0.3074010
## Test set     0.8145199 1.923091 1.1434013 0.6891346 1.223983 0.3668864
##                     ACF1 Theil's U
## Training set -0.12120425        NA
## Test set     -0.01525507 0.1251457

Forecasting Australian Wine Sales

You are hired to obtain short-term forecasts (2-3 months ahead) for each of the six series (red, rose, sweet white, dry white, sparkling, and fortified), and this task will be repeated every month.

(a) Which smoothing method would you choose if you had to choose the same method for forecasting all series? Why?

On first glance it is apparent that all six series have some seasonality to them. Some of the series have distinct trends to them (including rose, red, sparkling, dry white, and fortified), while the sweet wine has less distinct seasonality … particularly up until (what appears to be a massive surge in consumption in) 1986. Given that we must select a forecasting method for all six series and it must be repeated on a monthly basis, then I would be forced to select a method that can accommodate seasonality and trend. This would rule out Moving Averages and Simple Exponential Smoothing, as well as the Holt method — which does not lend itself well to account for seasonality.

(b) Fortified wine has the largest market share of the six types of wine. You are asked to focus on fortified wine sales alone and produce as accurate a forecast as possible for the next two months.

#Upload of the AustralianWines.csv file that was then partitioned for Training and Validation Periods
aus_data <- read.csv("AustralianWines.csv")

#Create time series
aus_wine <- ts(aus_data$Fortified, start = c(1980,1), frequency = 12)

#Establish the Training and Validation periods for aus_wines
nValid <- 4
nTrain <- length(aus_wine) - nValid

wineTRAIN <- window(aus_wine, start = c(1980,1), end = c(1993, 12))
wineVALID <- window(aus_wine, start = c(1994, 1), end = c(1994, 12))

#Plot Fortified Wine Sales with line distinctions for Training and Validation periods
yrange = range(aus_wine)

plot(c(1980, 1995), c(0, 7000), type="n", main="Sales of Fortified Wine in Australia from 1980 to 1995", xlab="Year", ylab="Liters (000s)", xaxt="n", yaxt="n")
lines(aus_wine, lwd=1, col="black")
axis(1, at=seq(1980, 1995, 1), labels=format(seq(1980, 1995, 1)))
axis(2, at=seq(0, 6000, 1000),las=2)
abline(v=1994)
arrows(1980, 6250, 1994, 6250, code=3, length=0.1)
text(1986.5, 6500, "T")
abline(v=1995)
arrows(1994, 6250, 1995, 6250, code=3, length=0.1)
text(1994.5, 6500, "V")

#Display Training and Validation Period data
pander(wineTRAIN, caption = "Training Period", split.table = Inf)

Training Period
	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
1980	2585	3368	3210	3111	3756	4216	5225	4426	3932	3816	3661	3795
1981	2285	2934	2985	3646	4198	4935	5618	5454	3624	2898	3802	3629
1982	2369	2511	3079	3728	4151	4326	5054	5138	3310	3508	3790	3446
1983	2127	2523	3017	3265	3822	4027	4420	5255	4009	3074	3465	3718
1984	1954	2604	3626	2836	4042	3584	4225	4523	2892	2876	3420	3159
1985	2101	2181	2724	2954	4092	3470	3990	4239	2855	2897	3433	3307
1986	1914	2214	2320	2714	3633	3295	4377	4442	2774	2840	2828	3758
1987	1610	1968	2248	3262	3164	2972	4041	3402	2898	2555	3056	3717
1988	1755	2193	2198	2777	3076	3389	4231	3118	2524	2280	2862	3502
1989	1558	1940	2226	2676	3145	3224	4117	3446	2482	2349	2986	3163
1990	1651	1725	2622	2316	2976	3263	3951	2917	2380	2458	2883	2579
1991	1330	1686	2457	2514	2834	2757	3425	3006	2369	2017	2507	3168
1992	1545	1643	2112	2415	2862	2822	3260	2606	2264	2250	2545	2856
1993	1208	1412	1964	2018	2329	2660	2923	2626	2132	1772	2526	2755

pander(wineVALID, caption = "Validation Period", split.table = Inf)

Validation Period
	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
1994	1154	1568	1965	2659	2354	2592	2714	2294	2416	2016	2799	2467

Table 1.1 - Table w/ MAPE value for Holt-Winter Forecast

#Call the hw() method
wineHWmult <- hw(wineTRAIN, seasonal="multiplicative", h=12)

#Test accuracy as compared to the Validation Period
accuracy(wineHWmult, wineVALID)

##                     ME     RMSE      MAE       MPE      MAPE      MASE
## Training set -22.06217 280.3027 221.7110 -1.459040  7.260039 0.7967865
## Test set     115.04537 337.3427 265.2853  3.739657 11.246490 0.9533845
##                    ACF1 Theil's U
## Training set 0.03407816        NA
## Test set     0.01658673 0.7213012

Table 1.1 - Forecast for the next two months

wineHWmult$mean[1:2]

## [1] 1326.700 1566.896

plot(c(1980, 1995), c(0, 7000), type="n", main="Sales of Fortified Wine in Australia from 1980 to 1995", xlab="Year", ylab="Liters (000s)", xaxt="n", yaxt="n")
lines(aus_wine, lwd=1, col="black")
lines(wineHWmult$mean, lwd=2, col="purple")
axis(1, at=seq(1980, 1995, 1), labels=format(seq(1980, 1995, 1)))
axis(2, at=seq(0, 6000, 1000),las=2)
abline(v=1994)
arrows(1980, 6250, 1994, 6250, code=3, length=0.1)
text(1986.5, 6500, "T")
abline(v=1995)
arrows(1994, 6250, 1995, 6250, code=3, length=0.1)
text(1994.5, 6500, "V")
legend("bottomleft", c("Actual","Holt-Winters'"),bty="n", lwd=2, col=c("black", "purple"))

(c) Create a plot for the residuals from the Holt-Winters’ exponential smoothing.

#Plot the residuals from HW
autoplot(resid(wineHWmult), main = "Residuals from Holt-Winters' multiplicative method", xlab = "Year", ylab = "Forecast Error")

checkresiduals(wineHWmult)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' multiplicative method
## Q* = 39.543, df = 8, p-value = 3.897e-06
## 
## Model df: 16.   Total lags used: 24

i. Based on this plot, which of the following statements are reasonable?
- Decembers (month 12) are not captured well by the model.
This statement is reasonable based on the drastic fluctuation on December residuals. - There is a strong correlation between sales on the same calendar month.
While there is seasonality to sales of Fortified Wine, I’m hesitant to declare that this is entirely reasonable. Considering the flucuations in residuals, it’s a little unclear. Especially given that there are some months that have a strong correlation between them. - The model does not capture the seasonality well
This statement doesn’t feel too reasonable. While perhaps not the most effective at capturing seasonality based on what we’ve seen in examples past, the Holt-Winters’ method is intended to be able to capture both trend and seasonality. Both of which are present in the Fortified Wine data set. - We should first deseasonalize the data and then apply Holt-Winters’ exponential smoothing.
Because Holt-Winters’ is equipped to capture both trend and seasonality then this statement feels unreasonable.

ii. How can you handle the above effect with exponential smoothing?
We are looking at month to month seasonality, but I wonder whether perhaps we extend that seasonality to explore annual seasonality. Given the impact that external factors may have on wine (i.e., temperature, vineyard conditions, etc.) it feels that an annual batch may skew the data set and potentially provide misleading results. I wonder how many times these participating Australian vineyards heard that their 1991 was a bottle of crap?

Week 5 : Beginning February 19th, 2018

Pete Wiernusz

2/24/2018