Problems: 2, 5, 8

2. Relationship between Moving Average and Exponential Smoothing: Assume that we apply a moving average to a series, using a very short window span. If we wanted to achieve an equivalent result using simple exponential smoothing, what value should the smoothing constant take?

If a moving average is applied to a series with a narrow window, the plot/forecast is going to show very local trends. The analyst is going to see those patterns or trends very pronounced even in the average. When it comes to simple exponential smoothing, an analyst would want to choose a higher smoothing constant to mimic the moving average forecast, such as 0.7 through 0.9. The forecast would have a high level of correction, which would express similar local trends to a moving average with a narrow window.

5. A) Which of the following methods would not be suitable for forecasting this series. Explain why or why not for each one.

Moving average of the raw series

The moving average takes an average across a window of consecutive periods to generate a forecast. This can be great for data that has no trend or seasonality, since it cannot generate either in its forecasts. The department store sales data will not work for a moving average because the data has trend and seasonality. We could generate the forecast with this method but it wouldn’t give us reliable informaiton.

Simple exponential smoothing of the raw series

Like the moving average, this method should only be used for data that has no trend or seasonality (“stationary”). If someone really wanted to use this method, you could use double differencing to remove the seasonality and trend from the data and then add them back after generating the forecasts. However, the user would need to choose the smoothing constant, which can be challenging as to not overfit the model.

Moving average of deseasonalized series

The moving average of deseasonalized series would still not work for the sales department data because it includes trend. The data would need to be de-trended. The moving average could be applied then the seasonality and trend would need to be added back to the model.

Double exponential smoothing of the raw

Double exponential smoothing would not work for the sales department data series because it includes seasonality. The data would need to be deseasonalized and detrended to be able to use double exponential smoothing.

Holt-Winter’s exponential smoothing of the raw series

Holt-Winter’s exponential smoothing method will work because it can handle seasonality and trend of a raw series. The department sales data set included both trend and seasonality so this is the only method I could see working without applying differencing or double-differencing.

5. B) A forecaster was tasked to generate forecasts for 4 quarters ahead. She therefore partitioned the data so that the last 4 quarters were designated as the validation period. The forecaster approached the forecasting task by using multiplicative Holt-Winter’s exponential smoothing.

i. Run this method on the data. Request the forecasts on the validation period.

setwd("C:/Users/larms.LA-INSP5559/Documents/R/win-library/3.3/17_0320_assignment4")
DeptSales<- read.csv("DeptSales.csv", stringsAsFactors = FALSE)
#I checked out the structure and header of the data series. 
str(DeptSales)
## 'data.frame':    24 obs. of  2 variables:
##  $ Quarter: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Sales  : int  50147 49325 57048 76781 48617 50898 58517 77691 50862 53028 ...
head(DeptSales)
##   Quarter Sales
## 1       1 50147
## 2       2 49325
## 3       3 57048
## 4       4 76781
## 5       5 48617
## 6       6 50898
#I created an initial time series and plot to take a look at the data.  
SalesTS<- ts(DeptSales$Sales, start = c(1), frequency = 4)
yrange = range(SalesTS)
plot(c(1,7), yrange, type = "n", xlab = "Year", ylab = "Dept. Store Sales ($ Thousands)", bty = "l", xaxt = "n", yaxt = "n")
lines(SalesTS,bty = "l")
axis(1,at=seq(1,7,1), labels = format(seq(1,7,1)))
axis(2,at=seq(48000,105000,5000),labels = format(seq(48.0,105.0,5)),las = 2)

#Need to partition the data into training and validation sets

#Determines the training and validation length
validationlen<- 4
traininglen<- length(SalesTS)-validationlen

#Sets the actual periods for training and validation sets
salestrain<- window(SalesTS,end = c(1,traininglen))
salesvalid<- window(SalesTS, start = c(1,traininglen + 1))
#Add library
library(forecast)
library(zoo)
library(dplyr)
HWets<- ets(salestrain, model = "ZZZ", alpha = 0.2, beta = 0.15, gamma = 0.05, restrict = FALSE)
HWnewforecast<- forecast(HWets, h = validationlen)
HWnewforecast
##      Point Forecast    Lo 80     Hi 80    Lo 95     Hi 95
## 6 Q1       62115.16 60268.28  63962.04 59290.60  64939.72
## 6 Q2       65371.60 63317.68  67425.53 62230.40  68512.81
## 6 Q3       77076.87 74420.49  79733.24 73014.29  81139.45
## 6 Q4      102937.73 98935.69 106939.77 96817.13 109058.33

ii. Using the forecasts for the validation set, compute the MAPE values for the forecasts of quarters 21-22.

accuracy(HWnewforecast, salesvalid)
##                     ME      RMSE       MAE        MPE      MAPE      MASE
## Training set  609.5633 1447.1798 1240.9947  0.9315256 1.9850921 0.3982014
## Test set     -366.8394  727.6403  566.4741 -0.6517749 0.8449629 0.1817661
##                     ACF1  Theil's U
## Training set 0.008503999         NA
## Test set     0.183050115 0.02380343

5. C) The fit and the residuals were displayed in the book. Please reproduce them with R code. Using all of the information from (b) and your generated figures, is this model suitable for forecasting quarters 21 and 22?

yrange = range(SalesTS)
plot(c(1,7), yrange, type = "n", xlab = "Year", ylab = "Dept. Store Sales ($ Thousands)", bty = "l", xaxt = "n", yaxt = "n")
axis(1,at=seq(1,7,1), labels = format(seq(1,7,1)))
axis(2,at=seq(48000,105000,5000),labels = format(seq(48.0,105.0,5)),las = 2)
lines(SalesTS,bty = "l")
lines(HWets$fitted, col = "orange", lty = 3, lwd = 2)
lines(HWnewforecast$mean, col = "blue", lty = 2, lwd = 2)

#plot residuals
plot(HWets$residuals, xlab = "Year", ylab = "Dept. Store Sales Residuals", bty = "l", xaxt = "n", yaxt = "n", lwd = 2, main = "Residuals", bty = "n")
axis(1,at=seq(1,7,1), labels = format(seq(1,7,1)), pos = 0)
axis(2,at=seq(-0.05,0.07, .01),labels = format(seq(-5000,7000,1000)),las = 2)

5. D) Another analyst decided to take a much simpler approach, and instead of using exponential smoothing he used differencing. Use differencing to remove the trend and seasonal pattern. Which order works better: first removing trend and then seasonality or the opposite order? Show the progression of time plots as you difference the data and each final series to provide evidence in support of your answer.

#Plot Lag-12 to Lag-1 difference and double-differenced 
par(mfrow = c(2,2))
plot(SalesTS, ylab = "Sales", xlab = "Year", bty = "l", main = "Dept. Store Sales ($ Thousands)")
plot(diff(SalesTS, lag=4),  ylab = "Lag-4", xlab = "Year", bty = "l", main = "Lag-4 Difference")
plot(diff(SalesTS,lag=1), ylab = "Lag-1", xlab = "Year", bty = "l", main = "Lag-1 Difference")
Saleslag4to1<- diff(diff(SalesTS, lag=4), lag=1)
plot(Saleslag4to1, ylab = "Lag-4, then Lag-1", xlab = "Year", bty = "l", main = "Twice-Differenced (Lag-4, Lag-1)")

#Plot Lag-1 to Lag-4 difference and double-differenced
par(mfrow = c(2,2))
plot(SalesTS, ylab = "Sales", xlab = "Year", bty = "l", main = "Dept. Store Sales ($ Thousands)")
plot(diff(SalesTS, lag=1),  ylab = "Lag-1", xlab = "Year", bty = "l", main = "Lag-1 Difference")
plot(diff(SalesTS, lag=4),  ylab = "Lag-12", xlab = "Year", bty = "l", main = "Lag-4 Difference")
Saleslag1to4<- diff(diff(SalesTS, lag=1), lag=4)
plot(Saleslag1to4, ylab = "Lag-1, then Lag-4", xlab = "Year", bty = "l", main = "Twice-Differenced (Lag-1, Lag-4)")

As you can see from the above models, there is no difference between applying a Lag-1 difference then Lag-4 or vice versa to the final differenced plot.

5. E) Forecast quarters 21-22 using the average of the double-differenced series from (d). Remember to use only the training period (until quarter 20), and to adjust back for the trend and seasonal pattern.

ddaverageforecast<- meanf(diff(diff(salestrain, lag = 4), lag = 1), h=2)
ddaverageforecast
##      Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## 6 Q1          569.2 -2116.935 3255.335 -3714.114 4852.514
## 6 Q2          569.2 -2116.935 3255.335 -3714.114 4852.514
#Convert back to the original time series

emptyvector<- vector() 

for (i in 1: validationlen) {if(i == 1) {emptyvector[i] <- ddaverageforecast$mean[i] + salestrain[(traininglen+i) - validationlen] + (salestrain[traininglen]-salestrain[traininglen - validationlen])} else {emptyvector[i] <- ddaverageforecast$mean[i] + salestrain[(traininglen+i)-validationlen] + (emptyvector[i-1] - salestrain[traininglen+i-1-validationlen])}}


emptyvector
## [1] 63982.2 68177.4      NA      NA

5. F) Compare the forecasts from (e) to the exponential smoothing forecasts found in (b). Which of the two forecasting methods would you choose? Explain.

I would choose the Holt-Winters forecast, since it is closer to the actuals. However, it would be important to check the MAPE to be sure that it is not an overfit to the model.

emptyvector
## [1] 63982.2 68177.4      NA      NA
HWnewforecast
##      Point Forecast    Lo 80     Hi 80    Lo 95     Hi 95
## 6 Q1       62115.16 60268.28  63962.04 59290.60  64939.72
## 6 Q2       65371.60 63317.68  67425.53 62230.40  68512.81
## 6 Q3       77076.87 74420.49  79733.24 73014.29  81139.45
## 6 Q4      102937.73 98935.69 106939.77 96817.13 109058.33
#Plot Forecasts to review 
par(mfrow = c(1,1))
plot(emptyvector, type="l", bty="l", main = "Forecast from 5E")
axis(1,at=seq(1,10,1), labels = format(seq(1,10,1)))
axis(2,at=seq(48000,105000,10000),labels = format(seq(48.0,105.0,10)),las = 2)

plot(HWnewforecast, ylab = "Sales", xlab = "Year", bty = "l", main = "Forecast from 5B")

5. G) What is an even simpler approach that should be compared as a baseline? Complete that comparison.

A naive forecast should always be considered as a baseline comparison because it is the simplest forecast on the most recent information. Often times, the most recent information can be the most valuable data in a forecast - or possibly the least. Since this forecast is simple to understand and easy to apply, it is a good baseline when reviewing the predictive performance of forecasting method.

#Generate naive forecast
validationlen<- 4
traininglen<- length(SalesTS)-validationlen
salestrain<- window(SalesTS,end = c(1,traininglen))
salesvalid<- window(SalesTS, start = c(1,traininglen + 1))
naivevalid <- snaive(salestrain, h = validationlen)
naivevalid
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 6 Q1          56405 51434.14 61375.86 48802.72 64007.28
## 6 Q2          60031 55060.14 65001.86 52428.72 67633.28
## 6 Q3          71486 66515.14 76456.86 63883.72 79088.28
## 6 Q4          92183 87212.14 97153.86 84580.72 99785.28
tail(DeptSales, 6)
##    Quarter  Sales
## 19      19  71486
## 20      20  92183
## 21      21  60800
## 22      22  64900
## 23      23  76997
## 24      24 103337
yrange = range(SalesTS)
plot(c(1,7), yrange, type = "n", xlab = "Year", ylab = "Dept. Store Sales ($ Thousands)", bty = "l", xaxt = "n", yaxt = "n", lwd = 2, main = "Dept. Store Sales - Naive Forecast")

axis(1,at=seq(1,7,1), labels = format(seq(1,7,1)))
axis(2,at=seq(40000,105000,5000),labels = format(seq(40.0,105.0,5)),las = 2)
lines(SalesTS, bty = "l")
lines(naivevalid$fitted, col = "red", bty = "l")
lines(naivevalid$mean, lwd = 2, lty = 2, col = "blue")

8. Forecasting Australian Wine Sales: Figure 5.14 shows time plots of monthly sales of six types of Australian wines (red, rose, sweet white, dry white, sparkling, and fortified) for 1980-1994. Data available in AustralianWines.xls. 23 The units are thousands of liters. You are hired to obtain short-term forecasts (2-3 months ahead) for each of the six series, and this task will be repeated every month.

8. A) Which smoothing method would you choose if you had to choose the same method for forecasting all series? Why?

When I reviewed the plots in the book for the various types of wine, I found that sales for some wines had trend and seasonality, seasonality only, trend only, and no trend or seasonality. In that case, Holt-Winters would be the best smoothing method for this data set because it can smooth data with both components in an additive or multiplicative model.

8. B) Fortified wine has the largest market share of the six types of wine. You are asked to focus on fortified wine sales alone and produce as accurate a forecast as possible for the next two months.

. Start by partitioning the data using the period until Dec- 1993 as the training period. . Apply Holt-Winter’s exponential smoothing (with multiplicative seasonality) to sales.

WineSales<- read.csv("Awines.csv", stringsAsFactors = FALSE)
head(WineSales)
##    Month Fortified  Red Rose sparkling Sweet.white Dry.white
## 1 Jan-80      2585  464  112      1686          85      1954
## 2 Feb-80      3368  675  118      1591          89      2302
## 3 Mar-80      3210  703  129      2304         109      3054
## 4 Apr-80      3111  887   99      1712          95      2414
## 5 May-80      3756 1139  116      1471          91      2226
## 6 Jun-80      4216 1077  168      1377          95      2725
str(WineSales)
## 'data.frame':    188 obs. of  7 variables:
##  $ Month      : chr  "Jan-80" "Feb-80" "Mar-80" "Apr-80" ...
##  $ Fortified  : int  2585 3368 3210 3111 3756 4216 5225 4426 3932 3816 ...
##  $ Red        : int  464 675 703 887 1139 1077 1318 1260 1120 963 ...
##  $ Rose       : int  112 118 129 99 116 168 118 129 205 147 ...
##  $ sparkling  : int  1686 1591 2304 1712 1471 1377 1966 2453 1984 2596 ...
##  $ Sweet.white: int  85 89 109 95 91 95 96 128 124 111 ...
##  $ Dry.white  : int  1954 2302 3054 2414 2226 2725 2589 3470 2400 3180 ...
winesalests<- ts(WineSales$Fortified, start = c(1980, 1), frequency = 12)

#Partition the data
winevalidationlen<- 12
winetraininglen<- length(winesalests)-winevalidationlen
winesalestrain<- window(winesalests,end = c(1980, winetraininglen))
winesalesvalid<- window(winesalests, start = c(1980, winetraininglen + 1))
HWwineets<- ets(winesalestrain, model = "ZZZ", restrict = FALSE)
HWwineets
## ETS(M,A,M) 
## 
## Call:
##  ets(y = winesalestrain, model = "ZZZ", restrict = FALSE) 
## 
##   Smoothing parameters:
##     alpha = 0.0496 
##     beta  = 9e-04 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 4041.6833 
##     b = -7.1612 
##     s=1.1334 1.0383 0.8836 0.9461 1.2608 1.3697
##            1.1428 1.1137 0.9614 0.8588 0.7053 0.5862
## 
##   sigma:  0.0884
## 
##      AIC     AICc      BIC 
## 2897.314 2901.188 2951.212
HWwineforecast<- forecast(HWwineets, h = winevalidationlen)
HWwineforecast
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Sep 1994       1997.758 1771.339 2224.176 1651.480 2344.035
## Oct 1994       1856.790 1646.076 2067.505 1534.530 2179.051
## Nov 1994       2171.441 1924.685 2418.196 1794.060 2548.821
## Dec 1994       2358.853 2090.421 2627.284 1948.323 2769.382
## Jan 1995       1214.174 1075.801 1352.547 1002.551 1425.798
## Feb 1995       1453.749 1287.819 1619.679 1199.981 1707.517
## Mar 1995       1761.509 1560.131 1962.888 1453.527 2069.491
## Apr 1995       1962.291 1737.586 2186.996 1618.635 2305.947
## May 1995       2261.951 2002.484 2521.418 1865.131 2658.771
## Jun 1995       2309.418 2044.031 2574.805 1903.544 2715.292
## Jul 1995       2754.163 2437.078 3071.248 2269.223 3239.102
## Aug 1995       2522.481 2231.507 2813.454 2077.475 2967.487

8. C) Create a plot for the residuals from the Holt-Winter’s exponential smoothing.

#plot residuals

plot(HWwineets$residuals, xlab = "Year", ylab = "Wine Sales Residuals", bty = "l", xaxt = "n", yaxt = "n", lwd = 2, main = "Residuals", bty = "n")
axis(1,at=seq(1980,1994,1), labels = format(seq(1980,1994,1)))
axis(2,at=seq(-.5,.5,.05),labels = format(seq(-5000,5000,500)),las = 2)

i. Based on this plot, which of the following statements are reasonable?

2. There is a strong correlation between sales on the same calendar month.

ii. How can you handle the above effect with exponential smoothing?

In order to handle the above effect with exponential smoothing, you would need to use a lower alpha, possibly start with the default = 0.2, in order to avoid overfitting and overcorrecting to the data set.