Assignment 4

Problem 2: Relationship between Moving Average and Exponential Smoothing To achieve an equivalent result using simple exponential smoothing the smoothing constant should be equal to 2 over the window plus one:

\[\alpha = \frac{2}{w+1}\] The MA with window width w is approximately equal when this is true. With a very short window span we would want the smoothing constant to be close to 1. The closer alpha is to 1, the less impact past values have on the forecast.

Problem 5: Forecasting Department Store Sales

# Read in Department Store Sales file
StoreSales <- read.csv("DeptStoreSales.csv", stringsAsFactors = FALSE)
# Shows structure of StoreSales data frame
str(StoreSales)

## 'data.frame':    24 obs. of  2 variables:
##  $ Quarter: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Sales  : int  50147 49325 57048 76781 48617 50898 58517 77691 50862 53028 ...

#Create time series object
SalesTS <- ts(StoreSales$Sales, start = c(1), frequency = 1)
salests <- ts(StoreSales$Sales, start = c(1), frequency = 4)

# Regenerate plot of time series object SalesTS shown in Figure 5.10
par(oma = c(0,1,0,0))
at_tick <- seq_len(length(SalesTS + 1))
plot((1:24), ylim = c(0,120000), xlim = c(1,24), type = "n", xlab = "Quarter", ylab = "Sales ($)", bty = "l", xaxt = "n", yaxt = "n", font.lab = 2)
points(SalesTS, pch = 19)
axis(2, at = seq(0, 120000, 20000), labels = format(seq(0, 120000, 20000)), las = 2, cex.axis = 0.5)
lines(SalesTS, bty = "l")
axis(1, at = at_tick, labels = FALSE)
axis(1, at = seq_along(SalesTS) - 0.5, tick = FALSE, labels = (1:24), cex.axis = 0.5)

# Create plot of time series object SalesTS
plot(c(1,24), ylim = c(40000,110000), xlim = c(1,24), type = "n", xlab = "Quarter", ylab = "Sales (in thousands)", bty = "l", xaxt = "n", yaxt = "n")
axis(1, at = seq(1,24,1), labels = format(seq(1,24,1)), cex.axis = 0.75)
axis(2, at = seq(40000, 110000, 10000), labels = format(seq(40, 110, 10)), las = 2, cex.axis = 0.75)
lines(SalesTS, bty = "l")
title("Department Store Quarterly Sales")

# Create year time series plot
plot(c(1,7), ylim = c(40000,120000), xlim = c(1,7), type = "n", xlab = "Year", ylab = "Sales (in thousands)", bty = "l", xaxt = "n", yaxt = "n")
box()
box(which = "plot")
minor.tick(nx = 3, ny = 2, tick.ratio = 0.5)
lines(salests, bty = "l")
axis(1, at = seq(1,7,1), labels = format(seq(1,7,1)))
axis(2, at = seq(40000, 120000, 20000), labels = format(seq(40, 120, 20)), las = 2)

Which of the following methods would not be suitable for forecasting this series? Explain why or why not. The raw series appears to have seasonality and trend. The same cycle is repeated every year and there is an upward trend.

Moving average of raw series MA of the raw series is not suitable for forecasting because it does not take seasonality or trend into consideration.
Moving average of deseasonalized series The MA is used for forecasting series without seasonality and trend. The deseasonalized series would still have trend and the MA would “lag behind”, which would result in under-forecasting.
Simple exponential smoothing of the raw series
SES of the raw series is not suitable for forecasting this series. LIke MA, SES is also used for series without seasonality and trend.
Double exponential smoothing of the raw series
Holt’s linear trend model is not suitable for forecsating this series because it only takes trend into account.
Holt-Winter’s exponential smoothing of the raw series
Holt-Winter’s exponential smoothing of the raw series is suitable for forecasting because it accounts for seasonality and trend.

Although only the last method would be suitable, differencing at lag-M, where M = 4 followed by lag-1, would remove the trend and/or seasonality, making the other methods suitable for forecasting the series.

The last 4 quarters of the series represent the validation period and the previous 20 quarters make up the training period.

# Set up the training and validation period lengths, where the validation length is equal to 4 quarters
valid.Length <- 4
train.Length <- length(salests) - valid.Length
sales.Train <- window(salests, end = c(1,train.Length))
sales.Valid <- window(salests, start = c(1, train.Length + 1))
ses.Sales <- ets(sales.Train, model = "ZZZ", alpha = 0.2, beta = 0.15, gamma = 0.05)

plot(c(1,7), ylim = c(40000,120000), xlim = c(1,7), type = "n", xlab = "Year", ylab = "Sales (in thousands)", bty = "l", xaxt = "n", yaxt = "n")
axis(1, at = seq(1,7,1), labels = format(seq(1,7,1)), cex.axis = 0.75)
axis(2, at = seq(40000, 120000, 20000), labels = format(seq(40, 120, 20)), las = 2, cex.axis = 0.75)
lines(salests, bty = "l")

ses.SalesPredictions <- forecast(ses.Sales, h = valid.Length, level = 0)
ses.SalesPredictions

##      Point Forecast      Lo 0      Hi 0
## 6 Q1       62115.16  62115.16  62115.16
## 6 Q2       65371.60  65371.60  65371.60
## 6 Q3       77076.87  77076.87  77076.87
## 6 Q4      102937.73 102937.73 102937.73

lines(ses.SalesPredictions$mean, col = "red", lwd = 2, lty = 2)

abline(v=6)
arrows(1, 100000, 6, 100000, code = 3, length = 0.1)
text(3.5, 110000, "Training", cex = 0.75)
abline(v=7)
arrows(6, 100000, 7, 100000, code = 3, length = 0.1)
text(6.5, 110000, "Validation", cex = 0.75)

Run this method on the data. Request the forecasts on the validation period

ses.SalesPredictions

##      Point Forecast      Lo 0      Hi 0
## 6 Q1       62115.16  62115.16  62115.16
## 6 Q2       65371.60  65371.60  65371.60
## 6 Q3       77076.87  77076.87  77076.87
## 6 Q4      102937.73 102937.73 102937.73

Compute MAPE values for Q21 and Q22.

Q21_accuracy <- accuracy(ses.SalesPredictions$mean[1], sales.Valid[1])
Q22_accuracy <- accuracy(ses.SalesPredictions$mean[2], sales.Valid[2])
combined_accuracy <- accuracy(ses.SalesPredictions$mean[1:2], sales.Valid[1:2])
Q21_accuracy

##                 ME     RMSE      MAE       MPE     MAPE
## Test set -1315.156 1315.156 1315.156 -2.163086 2.163086

Q22_accuracy

##                 ME     RMSE      MAE        MPE      MAPE
## Test set -471.6041 471.6041 471.6041 -0.7266627 0.7266627

combined_accuracy

##                 ME    RMSE      MAE       MPE     MAPE
## Test set -893.3803 987.939 893.3803 -1.444875 1.444875

Reproduce the fit and residuals displayed in figure 5.11. Is the model suitable for forecasting quarters 21-22?

pointForecasts <- meanf(diff(diff(sales.Train, lag = 4), lag = 1), h=4)

realForecasts <- vector()
for (i in 1:valid.Length)  {
  if(i == 1)  {
    realForecasts[i] <- pointForecasts$mean[i] + sales.Train[(train.Length+i)-valid.Length] + (sales.Train[train.Length] - sales.Train[train.Length - valid.Length])
  } else {
    realForecasts[i] <- pointForecasts$mean[i] + sales.Train[(train.Length+i)-valid.Length] + (realForecasts[i-1] - sales.Train[train.Length+i-1-valid.Length])
  }
}

yrange <- range(SalesTS)
par(oma = c(0,2,0,0))
plot(c(1,20), yrange, type = "n", xlab = "Quarter", ylab = "Sales ($)", bty = "l", xaxt = "n", yaxt = "n")
lines(sales.Train, bty = "l")
axis(1, at = seq(1,20,1), labels = format(seq(1,20,1)), cex.axis = 0.5)
axis(2, at = seq(0, 100000, 10000), labels = format(seq(0, 100000, 10000)), las = 2, cex.axis = 0.5)
lines(ses.Sales$fitted, col = "red", lty = 2, lwd = 2)
legend(1,100000, c("Actual", "Forecast"), lwd = 2, lty = c(1,2), col = c("black", "red"), bty = "n")
title("Exp Smoothing: Actual Vs Forecast (Training Data)")

plot(c(1,6),yrange,type="n",xlab="Years",ylab="Sales (thousands)",bty="l",xaxt="n",yaxt="n")
lines(sales.Train,bty="l")
axis(1, at=seq(1,6,1), labels=format(seq(1,6,1)))
axis(2, at=seq(50000,110000,10000), labels=format(seq(50,110,10)), las=2)
lines(ses.Sales$fitted,col="red",lty=2,lwd=2)
legend(1,100000, c("Actual", "Forecast"),lwd=2, lty=c(1,2), col=c("black", "red"), bty="n")
title("Exp Smoothing: Actual Vs Forecast (Training Data)")

plot(ses.SalesPredictions$residuals,xlab="Years",ylab="Forecast Error",bty="n",xaxt="n",yaxt="n")
axis(1,pos=c(0,0), at=seq(1,6,1), labels=format(seq(1,6,1)))
axis(2, at=seq(-0.04,0.06,0.02), labels=format(seq(-0.04,0.06,0.02)), las=2)
abline(h = 0)
title("Exp Smoothing Forecasting Errors (Training Data)")

The model produced by the Holt-Winter’s method is suitable for forecasting quarters 21-22 as evidenced by the low MAPE for Q21 & Q22.

Use differencing to remove the trend and seasonal pattern. Which order works better: first removing trend and then seasonality or the opposite order? Show a progression of time plots as you difference the data and each final series to provide evidence in support of your answer.

Double difference by removing trend and then seasonal pattern

# Set up the plot to have two rows and two columns
par(mfrow=c(2,2))

# Plot original series
plot(salests,ylab="Sales", xlab="Years",bty="l",main="Department Store Sales")

# Double difference (trend then seasonality)
# Plot lag-1 difference (de-trended)
plot(diff(salests, lag = 1), ylab="Lag-1", xlab="Years", bty="l", main="Lag-1 Difference")

# Plot lag-4 difference (deseasonalized)
plot(diff(salests, lag = 4), ylab="Lag-4", xlab="Years", bty="l", main="Lag-4 Difference")

# Double difference the lag-1 and lag-4 (de-trended then deseasonalized)
lag1ThenLag4 <- diff(diff(salests, lag=1),lag=4)

# Plot the double difference
plot(lag1ThenLag4, ylab="Lag-1, then Lag-4", xlab="Years", bty="l", main="Twice-Differenced (Lag-1, Lag-4)")

Double difference by removing seasonal and then trend pattern

# Set up the plot to have two rows and two columns
par(mfrow=c(2,2))

# Plot original series
plot(salests,ylab="Sales", xlab="Years",bty="l",main="Department Store Sales")

# Double difference (trend then seasonality)
# Plot lag-4 difference (deseasonalized)
plot(diff(salests, lag = 4), ylab="Lag-4", xlab="Years", bty="l", main="Lag-4 Difference")

# Plot lag-1 difference (de-trended)
plot(diff(salests, lag = 1), ylab="Lag-1", xlab="Years", bty="l", main="Lag-1 Difference")

# Double difference the lag-4 and then lag-1 (deseasonalized and detrended)
lag4ThenLag1 <- diff(diff(salests, lag=4),lag=1)

# Plot the double difference
plot(lag4ThenLag1, ylab="Lag-4, then Lag-1", xlab="Years", bty="l", main="Twice-Differenced (Lag-4, Lag-1)")

Forecast quarters 21-22 using the average of the double-differenced series from (d).

Using the meanf() function will show the point forecast and prediction interval for the stationary process produced from double differencing.

pointForecasts <- meanf(diff(diff(sales.Train, lag=4), lag=1), h=4)
pointForecasts

##      Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## 6 Q1          569.2 -2116.935 3255.335 -3714.114 4852.514
## 6 Q2          569.2 -2116.935 3255.335 -3714.114 4852.514
## 6 Q3          569.2 -2116.935 3255.335 -3714.114 4852.514
## 6 Q4          569.2 -2116.935 3255.335 -3714.114 4852.514

Adjust back for the trend and seasonal pattern.

realForecasts <- vector()

for (i in 1:valid.Length) {
  if(i == 1) {
    realForecasts[i] <- pointForecasts$mean[i] + sales.Train[(train.Length+i)-valid.Length] + (sales.Train[train.Length] - sales.Train[train.Length - valid.Length])
  } else {
    realForecasts[i] <- pointForecasts$mean[i] + sales.Train[(train.Length+i)-valid.Length] + (realForecasts[i-1] - sales.Train[train.Length+i-1-valid.Length])
  }
}
# plot
par(mfrow = c(1,1))
plot(realForecasts, type="l", bty="l")

realForecasts

## [1]  63982.2  68177.4  80201.6 101467.8

Compare forecasts from (e) to the exponential smoothing forecasts in the table in (b). Which of the two forecasting methods would you choose? Explain.

Plot the double-differenced forecasts against the actual

# Set the y range to help set up the plot
yrange = range(salests)

# Set up the plot
plot(c(1,7),yrange,type="n",xlab="Years",ylab="Sales (thousands)",bty="l",xaxt="n",yaxt="n")

# Add the x and y axes
axis(1, at=seq(1,7,1), labels=format(seq(1,7,1)))
axis(2, at=seq(40000,120000,20000), labels=format(seq(40,120,20)), las=2)

# Add the "visual breakpoints"
abline(v=6)
arrows(1, 100000, 6, 100000, code=3, length=0.1)
text(3.5, 102000, "Training")
abline(v=7)
arrows(6, 100000, 7, 100000, code=3, length=0.1)
text(6.5, 102000, "Validation")

# Add the time series for training set
lines(salests,bty="l")

# Add forecasts for validation period
lines(seq(6, 7-1/valid.Length, 1/valid.Length), realForecasts, col="red", lwd=2, lty=2)

# Add a legend to the plot
legend(1,100000, c("Actual", "Double Differenced"),lwd=2, lty=c(1,2), col=c("black", "red"), bty="n")

# Take a look at how good the forecasts are by calling the accuracy () function
accuracy(realForecasts, sales.Valid)

##                ME    RMSE     MAE       MPE     MAPE        ACF1 Theil's U
## Test set -1948.75 2942.41 2883.35 -3.159236 4.063656 -0.07650809 0.1948234

The Holt-Winter’s method gives a lower MAPE, so I would use that instead of the double differenced method.

What is an even simpler approach that should be compared as a baseline?

The trailing moving average is a simple approach that can be used as a baseline. The last value of the time series is the forecasted value for the entire validation period. The seasonal naive forecast is another simple approach that can be used as a baseline. Naive methods are often used as baselines because of their simplicity.

# Find the trailing moving average based on the training set
trailingMA <- rollmean(sales.Train, k=4, align = "right")
trailingMA.Forecast <- tail(trailingMA, 1)
trailingForecasts <- ts(rep(trailingMA.Forecast, valid.Length), start = c(1,train.Length + 1), freq = 4)

# Find the naive forecast
salesNaive <- snaive(sales.Train,h=valid.Length)

yrange = range(salests)

plot(c(1,7), yrange, type = "n", xlab = "Year", ylab = "Sales $ (in thousands)", bty = "l", xaxt = "n", yaxt = "n")

# Add the original time series
lines(salests, bty = "l")

# Add x and y axes
axis(1, at=seq(1,7,1), labels=format(seq(1,7,1)))
axis(2, at=seq(40000,120000,20000), labels=format(seq(40,120,20)), las=2)

# Add the "visual breakpoints"
abline(v=6)
arrows(1, 95000, 6, 95000, code=3, length=0.1)
text(3.5, 100000, "Training", cex = 0.75)
abline(v=7)
arrows(6, 95000, 7, 95000, code=3, length=0.1)
text(6.5, 100000, "Validation", cex = 0.75)

# Add trailing moving average and seasonal naive forecasts
lines(trailingForecasts, col = "red", lty = 2)

lines(salesNaive$mean, col = "blue", lty = 3)

# Add legend
legend(1,94000, c("Actual", "Trailing MA", "Naive"),lwd=2, lty=c(1,2,3), col=c("black", "red", "blue"), bty="n", cex = 0.75)

accuracy(trailingForecasts, sales.Valid)

##               ME     RMSE     MAE      MPE     MAPE      ACF1 Theil's U
## Test set 6482.25 17815.72 13658.5 4.553724 16.09045 0.1722954  1.147438

accuracy(salesNaive, sales.Valid)

##                   ME     RMSE     MAE      MPE     MAPE     MASE
## Training set 2925.25 3878.784 3116.50 4.344358 4.737739 1.000000
## Test set     6482.25 7032.176 6482.25 8.170540 8.170540 2.079978
##                    ACF1 Theil's U
## Training set 0.54855812        NA
## Test set     0.01334402 0.4705349

Problem 8: Forecasting Australian Wine Sales

AusWines <- read.csv("AustralianWines.csv", stringsAsFactors = FALSE)

Which smoothing method would you choose if you had to choose the same method for forecasting all series? Why? Since each series is different and having trend and seasonality differs between them I would choose the Holt-Winter’s method, which accounts for seasonality and trend. If one or neither is present, the model can be adjusted to account for this (i.e. use of smoothing constants alph, beta and gamma).
Produce as accurate a forecast as possible for the next two months for fortified wine sales.

fortifiedTS <- ts(AusWines$Fortified, start=c(1980,1), frequency=12)

yrange <- range(fortifiedTS)
xrange <- range(AusWines$Month)
plot(fortifiedTS, xlab = "Year", ylab = "Fortified Wine Sales")

valid.Length <- 12
train.Length <- length(fortifiedTS) - valid.Length
fortified.Train <- window(fortifiedTS,end=c(1980,train.Length))
fortified.Valid <- window(fortifiedTS,start=c(1980,train.Length+1),end=c(1980,train.Length+valid.Length))
sesFortified <- ets(fortified.Train,model="ZZZ",restrict=FALSE, allow.multiplicative.trend = TRUE)
sesFortified

## ETS(M,M,M) 
## 
## Call:
##  ets(y = fortified.Train, model = "ZZZ", restrict = FALSE, allow.multiplicative.trend = TRUE) 
## 
##   Smoothing parameters:
##     alpha = 0.0203 
##     beta  = 1e-04 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 4039.3422 
##     b = 0.9965 
##     s=1.1306 1.0327 0.8852 0.9611 1.2728 1.3569
##            1.1372 1.1133 0.9631 0.862 0.6926 0.5926
## 
##   sigma:  0.0876
## 
##      AIC     AICc      BIC 
## 2892.668 2896.541 2946.566

sesFortifiedOpt <- ets(fortified.Train, model = "MMM", alpha = 0.02)
sesFortifiedOpt

## ETS(M,M,M) 
## 
## Call:
##  ets(y = fortified.Train, model = "MMM", alpha = 0.02) 
## 
##   Smoothing parameters:
##     alpha = 0.02 
##     beta  = 1e-04 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 4081.2317 
##     b = 0.9964 
##     s=1.1253 1.044 0.8885 0.9475 1.2629 1.3737
##            1.1428 1.118 0.9565 0.8577 0.699 0.5841
## 
##   sigma:  0.0872
## 
##      AIC     AICc      BIC 
## 2889.229 2892.650 2939.957

sesFortifiedPredictions <- forecast(sesFortifiedOpt,h=valid.Length,level=0)
sesFortifiedPredictions$mean[1:2]

## [1] 2051.842 1917.207

plot(sesFortifiedPredictions,bty="l")

accuracy(sesFortifiedPredictions$mean[1:2],fortified.Valid[1:2])

##                ME    RMSE      MAE      MPE     MAPE
## Test set 231.4753 266.806 231.4753 9.986598 9.986598

Create a plot for the residuals from the Holt-Winter’s exponential smoothing.

plot(sesFortifiedPredictions$residuals,xlab="Year",ylab="Forecast Error",bty="n",xaxt="n")
axis(1, at=seq(1980,1995,1), labels=format(seq(1980,1995,1)))
abline(h = 0)
title("Exp Smoothing Forecasting Errors (Training Data)")

Based on this plot, which of the following statements are reasonable? The first three statements are reasonable. The last is not because the Holt-Winter’s method accounts for seasonality.
How can you handle the above effect with exponential smoothing? You can try adjusting the three smoothing constants to optimize the forecasts.

The double-seasonal Holt-Winter’s method by Taylor could be used if there are more than two seasonal cycles.

Assignment 4

Meryn Lounsbury

March 21, 2017