Assignment 4 - Smoothing Methods

Question 2

According to the text, the moving average constant w and the smoothing constant alpha (a) are approximately equal if w = 2/a - 1. In order to find what the smoothing constant should be to achieve an equivalent result, we must solve for alpha and insert the window width w. In this case, a = (2/w) + 1, then we would replace w with the actual window width and solve for a.

Question 5

A

Which of the following methods would not be suitable for forecasting this series?

The plot appears to have seasonality and trend, therefore:

Moving average of raw series - can only be used with a series that lacks both seasonality and trend, so no this method cannot be used.

Moving average of deseasonalized series - removes the seasonality from a series but not trend, so no, this method cannot be used in this case.

Simple exponential smoothing of the raw series - like moving average, this method cannot be used on a series with seasonality or trend, so no, it cannot be used.

Double exponential smoothing of the raw series - allows for changes in trend but not seasonality so no this method cannot be used.

Holt-Winter’s exponential smoothing of the raw series - allows for changes in both trend and seasonality, so yes, this method can be used.

B-i

Run the method on the data and request the forecasts on the validation period.

library(readxl)
library(forecast)

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Loading required package: timeDate

## This is forecast 7.3

# Create dataframe
Sales <- read_excel("~/rProjects/Assignment_4/Sales.xlsx")

# Set up time series
sales.ts <- ts(Sales$Sales, start=c(1), frequency = 4)

# Set the training and validation periods
nValid <- 4
nTrain <- length(sales.ts) - nValid

train.ts <- window(sales.ts, end = c(1, nTrain))
valid.ts <- window(sales.ts, start=c(1, nTrain + 1))

# Create simple exponential smoothing model
sesSales <- ets(train.ts, restrict = FALSE, model="ZZZ", alpha=0.2, beta = .15, gamma = .05)

# Generate forcasts for validation period
sesSales.pred <- forecast(sesSales, h = nValid, level=0)

# Request the forecasts on the validation period
sesSales.pred

##      Point Forecast      Lo 0      Hi 0
## 6 Q1       62115.16  62115.16  62115.16
## 6 Q2       65371.60  65371.60  65371.60
## 6 Q3       77076.87  77076.87  77076.87
## 6 Q4      102937.73 102937.73 102937.73

B-ii

Compute the MAPE values for the forecasts of quarters 21-22.

accuracy(sesSales.pred$mean[1:2], nValid[1:2])

##                 ME     RMSE      MAE      MPE    MAPE
## Test set -62111.16 62111.16 62111.16 -1552779 1552779

C

Reproduce the fit and residual plots in r that are found in the text. Is this model suitable for forecasting Q21 & Q22?

yrange = range(sales.ts)

par(mar = c(5, 5, 4, 1), mgp = c(4, 1, 0))

# Create Plot - adjust axis ranges and titles
plot(c(1, 5), c(0, yrange[2]), type="n", xlab="Quarter", ylab="Sales ($)", bty="l", xaxt="n", yaxt="n", main = "Exp Smoothing: Actual Vs Forecast (Training Data)")

# Add the time series training set
lines(sales.ts, bty="l")

# Add the x-axis labels
axis(1, at=seq(1,5,1), labels=format(seq(1,5,1)))

# Add the y-axis labels
axis(2, at=seq(0,120000,10000), labels=format(seq(0,120000,10000)), las=2)

# Add the fitted values to the training set
lines(sesSales$fitted, col="red", lwd=2)

# Add the legend
legend(14,35000, c("Actual","Forecast"), col=1:2, lty=1:1, lwd = 1:2)

# Set up the plot
yRange2 = range(sesSales$residuals)

# Set margins for plot
par(mar = c(5, 5, 4, 1), mgp = c(3.5, 1, 0))

# Create Plot - adjust axis ranges and titles
plot(c(1, 5), yRange2, type="n", xlab="Quarter", ylab="Forecast error", bty="n", xaxt="n", yaxt="n", main = "Exp Smoothing: Forecast Errors (Training Data)")

# Add the x-axis labels
axis(1, at=seq(-1,5,1), labels=format(seq(-1,5,1)), pos = 0)

# Add the y-axis labels
axis(2, at=seq(-.25, .25, .01), labels=format(seq(-.25, .25, .01)),las=2)

# Add the residual values to the training set
lines(sesSales$residuals, col="black", lwd=2)

My one concern that I have in saying yes, this model would be good for forecasting Q21 & Q22, is that the errors seem very small; leading me to believe that we may have over fit the training period. I would try other variations of alpha, beta, gamma and model before conclusively saying that the above model is best.

D

Use differencing to remove trend and seasonality

First remove seasonality then trend:

# Set up the plot to have two rows and two columns
par(mfrow = c(2,2))

# plot the original time series
plot(sales.ts, ylab="Sales ($)", xlab="Quarter", bty="l", main="Sales by Quarter")

# Plot the lag-12 difference (de-seasonalized)
plot(diff(sales.ts, lag=4), ylab="Lag-4", xlab="Quarter", bty="l", main="Lag-4 Difference")

# Plot the lag-1 difference (de-trended)
plot(diff(sales.ts, lag=1), ylab="Lag-1", xlab="Quarter", bty="l", main="Lag-1 Difference")

# Plot the double-difference
lag4ThenLag1 <- diff(diff(sales.ts, lag=4), lag=1)
plot(lag4ThenLag1, ylab="Lag-4, then Lag-1", xlab="Quarter", bty="l", main="Twice-Differenced (Lag-4, Lag-1)")

Next, remove trend then seasonality:

# Set up the plot to have two rows and two columns
par(mfrow = c(2,2))

# plot the original time series
plot(sales.ts, ylab="Sales ($)", xlab="Quarter", bty="l", main="Sales by Quarter")

# Plot the lag-1 difference (de-trended)
plot(diff(sales.ts, lag=1), ylab="Lag-1", xlab="Quarter", bty="l", main="Lag-1 Difference")

# Plot the lag-12 difference (de-seasonalized)
plot(diff(sales.ts, lag=4), ylab="Lag-4", xlab="Quarter", bty="l", main="Lag-4 Difference")

# Plot the double-difference
lag1ThenLag4 <- diff(diff(sales.ts, lag=1), lag=4)
plot(lag1ThenLag4, ylab="Lag-1, then Lag-4", xlab="Quarter", bty="l", main="Twice-Differenced (Lag-1, Lag-4)")

Conclusion: Looking at the two final Twice-Differenced plots, I would say that it does not matter which pattern is removed first, trend or seasonality, because you end up with the same result.

E

ddSales.pred <- meanf(diff(diff(train.ts, lag=4), lag=1), h=2)
ddSales.pred

##      Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## 6 Q1          569.2 -2116.935 3255.335 -3714.114 4852.514
## 6 Q2          569.2 -2116.935 3255.335 -3714.114 4852.514

# Set up empty vector
adjustBack <- vector()

for (i in 1:nValid) {
  if(i == 1) {
    adjustBack[i] <- ddSales.pred$mean[i] + train.ts[(nTrain+i)-nValid] + (train.ts[nTrain] - train.ts[nTrain - nValid])
  } else {
    adjustBack[i] <- ddSales.pred$mean[i] + train.ts[(nTrain+i)-nValid] + (adjustBack[i-1] - train.ts[nTrain+i-1-nValid])
  }
}

# Adjusted forecasted Quarters 21 & 22
adjustBack

## [1] 63982.2 68177.4      NA      NA

F

When comparing the adjusted back double difference values from E to the forecasted values from B (seen below), you can see that they are very similar. Q21 differs by 1867.04, and Q22 differs by 2805.8. If I had to choose, I would use the exponential smoothing method from B because it does not require values to be adjusted back. I believe that by reducing data manipulation steps, you also reduce the risk of error.

sesSales.pred

##      Point Forecast      Lo 0      Hi 0
## 6 Q1       62115.16  62115.16  62115.16
## 6 Q2       65371.60  65371.60  65371.60
## 6 Q3       77076.87  77076.87  77076.87
## 6 Q4      102937.73 102937.73 102937.73

G

An even simpler method that I would use as a baseline would be the seasonal naive forecast. The other, more basic methods talked about in our current chapter should not be applied to a series that has seasonality and/or trend. And we learned earlier that using the seasonal naive forecasts are the easist and often quite accurate method to use for forecasting.

# Set up snaive window: quarters 17 - 20
seasonal <- window(sales.ts, start = c(1, nTrain - (nValid - 1)), end = c(1, nTrain))

seasonal

##    Qtr1  Qtr2  Qtr3  Qtr4
## 5 56405 60031 71486 92183

valid.ts

##     Qtr1   Qtr2   Qtr3   Qtr4
## 6  60800  64900  76997 103337

#Plot actual validation period with seasonal naive forecast
par(mar = c(5, 5, 4, 1), mgp = c(3.5, 1, 0))
plot(valid.ts, type='l', col = "black", ylab = "Sales ($)", xlab = "Quarter", main = "Validation Period with Seasonal Naive Forecast", xaxt = 'n')
axis(1, at=c(6, 6.25, 6.5, 6.75), labels=c("Q17/21", "Q18/22", "Q19/23", "Q20/24" ), las = 2)
legend(6.1, 100000, c("Actual","Seasonal Naive Forecast"), col=1:2, lty=1:2)
par(new=T)
plot(seasonal, type='l', lty = 2, col = "red", ylab = " ", xlab = " ", axes = F)

par(new=F)

Question 8

A

I would use Holt-Winter’s Exponential Smoothing method if I could only choose one for all series. Each different wine series shows trend, seasonality or both and this method handles each of those scenarios the best with the least amount of work (recalling and differencing requires extra steps to adjust back).

B

# Load data set
wineSales <- read_excel("~/rProjects/Assignment_4/wineSales.xlsx")

# Create time series
fortWine.ts <- ts(wineSales$Fortified, start=c(1980, 1), frequency=12)

# Set training period and validation period lengths
fwValidLength <- 12
fwTrainLength <- length(fortWine.ts) - fwValidLength

# Partition the data into training and validation periods
fwTrain <- window(fortWine.ts, end=c(1980, fwTrainLength))
fwValid <- window(fortWine.ts, start=c(1980,fwTrainLength + 1))

# Apply Holt-Winter's exponential smoothing with multiplicative seasonality to sales
hwFortWine <- ets(fwTrain, restrict = FALSE, model = "ZZM")

hwFortWine

## ETS(M,A,M) 
## 
## Call:
##  ets(y = fwTrain, model = "ZZM", restrict = FALSE) 
## 
##   Smoothing parameters:
##     alpha = 0.0555 
##     beta  = 9e-04 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 4040.0811 
##     b = -6.7983 
##     s=1.1316 1.0399 0.8877 0.9505 1.2722 1.3862
##            1.1463 1.1097 0.9345 0.8513 0.6996 0.5903
## 
##   sigma:  0.0859
## 
##      AIC     AICc      BIC 
## 2755.038 2759.118 2808.145

When I applied Holt Winter’s method, I specified multiplicative seasonality and allowed ets to determine the best fit model. The selection resulted in: alpha = .0555, beta = .00009, .00001 as shown above.

The below output displays the predicions for the first 2 months of 1994.

# Make predicitions for entire validation period
hwFortWine.pred <- forecast(hwFortWine, h = fwValidLength, level = 0)

# Make predicitions for the next 2 months only
hwFortWine.pred.next2 <- forecast(hwFortWine, h = fwValidLength - 10, level = 0)

hwFortWine.pred.next2

##          Point Forecast     Lo 0     Hi 0
## Jan 1994       1289.829 1289.829 1289.829
## Feb 1994       1521.475 1521.475 1521.475

C

# Set up the plot
yRange3 = range(hwFortWine$residuals)

# Set margins for plot
par(mar = c(5, 5, 4, 1), mgp = c(3.5, 1, 0))

# Create Plot - adjust axis ranges and titles
plot(c(1980, 1994), yRange3, type="n", xlab="Validation Period", ylab="Forecast error", bty="n", main = "Exp Smoothing: Forecast Errors (Training Data)")

# Add the residual values to the training set
lines(hwFortWine$residuals, col="black", lwd=2)

C-i

1. Using the plot above and the chart below as a guide, I would agree that the December months are not captured well. I came to this conclusion based on the relatively high errors (both positive and negative) in December compared to other months. Also, looking at the plot of sales broken out by month, it is possible that the errors tend to be higher for December because December sales fall near the middle of the pack and so the forecasts are trying to account for months like January and July when sales are at their lowest and highest respectively. This could indicate over-fitting the model to the time series.

2. There does appear to be a correlation between sales within the same calendar month - this would account for the trend that we see in the residuals plot. Again, this can be seen in the sales broken out by month below.

3. I do not believe that it is reasonable to say that the model does not capture the seasonality well because Holt-Winter’s model takes seasonality (s) into account, which can be seen in the print out of ets: s=1.1316 1.0399 0.8877 0.9505 1.2722 1.3862 1.1463 1.1097 0.9345 0.8513 0.6996 0.5903.

4. There is no need to deseasonlize the data before applying Holt-Winter’s model, and the reason why is outlined in C-ii below.

C-ii

In order to account for the apparent multiple seasonalities within this series, we would want to use Holt-Winter’s recently extended model that accounts for multiple seasonal cycles. This can be accomplished by using the dshw or tbats or stlm and msts functions within r. The msts function helps to set up the time series when multiple seasons are present; dshw is applied when the seasonal periods are nested, the longer period divided by the shorter period must be an integer; tbats allows for dynamic seasonality and stlm seasonally adjusts the data, then re-seasonalizes the results by adding back the last year of the estimated seasonal component.

For our case with the fortified wines, the dshw function should work just fine because there are yearly and montly seasonalities which meets the criteria for dshw.

hwFortWine$residuals

##                Jan           Feb           Mar           Apr           May
## 1980  0.0856916452  0.1898210325 -0.0764727077 -0.1798850412 -0.1563785016
## 1981  0.0059436319  0.0918795734 -0.0897918382  0.0201690938 -0.0098080926
## 1982  0.0712547458 -0.0436091662 -0.0316867753  0.0724436794  0.0037771839
## 1983 -0.0109992268 -0.0071956238 -0.0215978216 -0.0318948934 -0.0416374353
## 1984 -0.0613302737  0.0618007288  0.2139964357 -0.1431734117  0.0391341016
## 1985  0.0683803035 -0.0649625770 -0.0338068342 -0.0406367754  0.1251885009
## 1986  0.0238211823  0.0013563191 -0.1347741159 -0.0675839057  0.0588632059
## 1987 -0.0992003433 -0.0622866459 -0.1132749063  0.1843444815 -0.0388387194
## 1988  0.0352741311  0.0936473479 -0.1003948181  0.0453411909 -0.0236154203
## 1989 -0.0391954057  0.0158109819 -0.0389847643  0.0591540623  0.0490895763
## 1990  0.0517693123 -0.0718628406  0.1687152513 -0.0647409008  0.0195989450
## 1991 -0.1192472284 -0.0477764048  0.1482393557  0.0658718006  0.0121035067
## 1992  0.0730428405 -0.0371836834  0.0234225692  0.0691088533  0.0671325756
## 1993 -0.1304969546 -0.1325531384  0.0032986925 -0.0568799873 -0.0763908845
##                Jun           Jul           Aug           Sep           Oct
## 1980 -0.0736132978 -0.0448583867 -0.1144922439  0.0619397310  0.1018881833
## 1981  0.1298865277  0.0582425474  0.1180329238 -0.0102235525 -0.1504902472
## 1982  0.0146620961 -0.0183464736  0.0909186590 -0.0620021791  0.0703981648
## 1983 -0.0178286405 -0.1053832054  0.1687981392  0.1854005781 -0.0345471609
## 1984 -0.1078849924 -0.1228610561  0.0328535997 -0.1152969777 -0.0493248417
## 1985 -0.0799326762 -0.1184530618  0.0305371891 -0.0694729752  0.0182574695
## 1986 -0.0700175701  0.0293991637  0.1405415592 -0.0505957045  0.0473784126
## 1987 -0.1207649988 -0.0007427876 -0.0797537598  0.0583529025 -0.0001658655
## 1988  0.0468427599  0.0821674019 -0.1317615320 -0.0484914419 -0.0735216262
## 1989  0.0424830373  0.1027116620  0.0038703422 -0.0285328936 -0.0101219934
## 1990  0.0852197649  0.0856198588 -0.1276275467 -0.0367264206  0.0714967452
## 1991 -0.0437321644 -0.0112744450 -0.0500835757  0.0090174126 -0.0768809244
## 1992  0.0188500781 -0.0238661341 -0.1452583925  0.0062501297  0.0748104249
## 1993  0.0302293012 -0.0611141161 -0.0735140000  0.0158028950 -0.0925461516
##                Nov           Dec
## 1980 -0.1009171682 -0.1369630197
## 1981 -0.0385957347 -0.1531194220
## 1982 -0.0145198469 -0.1741227820
## 1983 -0.0670439870 -0.0743708121
## 1984 -0.0295490987 -0.1725130567
## 1985  0.0324793977 -0.0845918279
## 1986 -0.1088241588  0.0990064251
## 1987  0.0250778862  0.1488585329
## 1988  0.0010116307  0.1302518073
## 1989  0.0791146339  0.0499986373
## 1990  0.0727357726 -0.1183523199
## 1991 -0.0121667473  0.1528348070
## 1992  0.0377991451  0.0723997274
## 1993  0.1154352567  0.1162038836

FortifiedWineSales <- read_excel("~/rProjects/Assignment_4/FortifiedWineSales.xlsx")
FortifiedWineSales.ts <- ts(FortifiedWineSales$Fortified, start=c(1980, 1), frequency=12)

# set the outer margin area to the right a bit bigger
par(oma = c(0, 0, 0, 2))

# We have 15 years of data
xrange <- c(1,16)
# Get the range of the ridership to set up a nicely formatted plot
yrange <- range(FortifiedWineSales.ts)

# set up the plot 
plot(xrange, yrange, type="n", xlab="Year", ylab="Fortified Wine Sales", bty="l")

# Give each of the months its own color, line type, and character
colors <- rainbow(12) 
linetype <- c(1:12) 
plotchar <- c(1:12)

# add lines 
for (i in 1:12) { 
  currentMonth <- subset(FortifiedWineSales.ts, cycle(FortifiedWineSales.ts)==i)
  lines(currentMonth, type="b", lwd=1.5,
      lty=linetype[i], col=colors[i], pch=plotchar[i]) 
} 

# add a title
title("Sales Broken Out by Month")

# add a legend 
legend(16,5000, 1:12, cex=0.8, col=colors, pch=plotchar, lty=linetype, title="Month", xpd=NA)

Assignment 4 - Smoothing Methods

Ashley Chase

March 19, 2017

Question 2

Question 5

A

Which of the following methods would not be suitable for forecasting this series?

The plot appears to have seasonality and trend, therefore:

Moving average of raw series - can only be used with a series that lacks both seasonality and trend, so no this method cannot be used.

Moving average of deseasonalized series - removes the seasonality from a series but not trend, so no, this method cannot be used in this case.

Simple exponential smoothing of the raw series - like moving average, this method cannot be used on a series with seasonality or trend, so no, it cannot be used.

Double exponential smoothing of the raw series - allows for changes in trend but not seasonality so no this method cannot be used.

Holt-Winter’s exponential smoothing of the raw series - allows for changes in both trend and seasonality, so yes, this method can be used.

B-i

Run the method on the data and request the forecasts on the validation period.

B-ii

Compute the MAPE values for the forecasts of quarters 21-22.

C

Reproduce the fit and residual plots in r that are found in the text. Is this model suitable for forecasting Q21 & Q22?

D

Use differencing to remove trend and seasonality

First remove seasonality then trend:

Next, remove trend then seasonality:

Conclusion: Looking at the two final Twice-Differenced plots, I would say that it does not matter which pattern is removed first, trend or seasonality, because you end up with the same result.

E

F

G

Question 8

A

B

When I applied Holt Winter’s method, I specified multiplicative seasonality and allowed ets to determine the best fit model. The selection resulted in: alpha = .0555, beta = .00009, .00001 as shown above.

The below output displays the predicions for the first 2 months of 1994.

C

C-i

2. There does appear to be a correlation between sales within the same calendar month - this would account for the trend that we see in the residuals plot. Again, this can be seen in the sales broken out by month below.

3. I do not believe that it is reasonable to say that the model does not capture the seasonality well because Holt-Winter’s model takes seasonality (s) into account, which can be seen in the print out of ets: s=1.1316 1.0399 0.8877 0.9505 1.2722 1.3862 1.1463 1.1097 0.9345 0.8513 0.6996 0.5903.

4. There is no need to deseasonlize the data before applying Holt-Winter’s model, and the reason why is outlined in C-ii below.

C-ii

For our case with the fortified wines, the dshw function should work just fine because there are yearly and montly seasonalities which meets the criteria for dshw.