Question 2 Relationship between Moving Average and Exponential Smoothing

Assume that we apply a moving average to a time series, using a very short window span. If we wanted to achieve an equivalent result using simple exponential smoothing, what value should the smoothing constant take?

If the window width was w than the smoothing constant (\(\alpha\)) should be:

\[\alpha = \frac{2}{w +1}\]



Question 5 Forecasting Department Store Sales

The file DepartmentStoreSales.xls contains date on the quarterly sales for a department store over a 6 year period. The quarters are numbered 1 through 24.

library(readr)
library(forecast)

DepStoreSales <- read_csv("data/DeptStoreSales.csv")
DSS_ts <- ts(DepStoreSales[,2])

par(mar = c(5,5,3,2))

plot(DSS_ts, type = 'o', pch = 20, lwd = 2, bty ='l',
     ylab = 'Sales (Thousands of $)', xlab = 'Quarter',
     main = 'Department Store Sales Timeseries',
     ylim = c(40000, 110000),
     axes = FALSE)

axis(1, 
     at = 1:24, labels = format(1:24))
axis(2, 
     at = seq(40000, 110000, 10000), labels = format(seq(40, 110, 10)),
     las = 2)


A): Which of the following methods would not be suitable for forecasting this series. Explain why or why not for each one.

  • Moving average of raw series
    • Would not be suitable. The moving average method can’t account for seasonality and lags behind trend (overforecasting in the presence of a decreasing trend and underforecasting in the presence of an increasing trend). This series shows strong seasonality as well as a week increasing trend.
  • Moving average of deseasonalized series
    • Would also not be suitable. While moving average would work better on the deseasonalized series than the raw series, it is still not ideal. The series has a slight increasing trend so it should be both deseasonalized and detrended before using the moving average method for forecasting.
  • Simple exponential smoothing of the raw series
    • Would not be suitable. As with the moving average method, simple exponential smoothing does not work well for series with seasonality and/or trend.
  • Double exponential smoothing of the raw series
    • Would not be suitable. While double exponential smoothing could deal with the trend in the raw series it could not account for the seasonality.
  • Holt-Winter’s exponential smoothing of the raw series
    • This method would be suitable because it can take both trend and seasonality into account.


B): A forecaster was tasked to generate forecasts for 4 quarters ahead. She therefore partitioned the data so that the last 4 quarters were designated as the validation period. The foreccaster approached the forecasting task by using multiplicative Holt-Winter’s exponential smoothing. Specifically, you should call the hw function with the parameter seasonal=“multiplicative”. Let the method pick the appropriate parameters for alpha, beta, and gamma.

partitioner <- function(timeseries, validationsize, start_time = NULL, freq = NULL){
     
     nValid <- validationsize
     nTrain <- length(timeseries) - nValid
     
     if(!is.null(freq)){
          timeseries <- ts(timeseries, frequency = freq)
     }
     
     if(is.null(start_time)){
          train_ts <- window(timeseries, start = 1, end = c(1, nTrain))
          valid_ts <- window(timeseries, 
                             start =  c(1, nTrain + 1), end = c(1, nTrain + nValid))
     }
     else{
          train_ts <- window(timeseries, 
                             start = c(start_time, 1), 
                             end = c(start_time, nTrain))
          valid_ts <- window(timeseries,
                             start = c(start_time, nTrain + 1),
                             end = c(start_time, nTrain + nValid))
     }
     
     partitions <- list(Training = train_ts,
                        Validation = valid_ts)
     
     return(partitions)
}

p <- partitioner(timeseries = DSS_ts, validationsize = 4, freq = 4)

train_ts <- p[[1]]
valid_ts <- p[[2]]

library(Hmisc)

par(mar = c(5,5,3,2))

plot(train_ts, type = 'l', pch = 20, lwd = 2,
     ylab = 'Sales (Thousands of $)', xlab = 'Year',
     main = 'Partitioned Timeseries',
     bty ='l', xlim = c(1,7), ylim = c(40000, 110000),
     axes = FALSE)

lines(valid_ts, lty = 2, lwd = 2, col = 'blue')
abline(v = 5.875, lty = 3)

axis(1, 
     at = 1:7, labels = format(1:7))
axis(2, 
     at = seq(40000, 110000, 10000), labels = format(seq(40, 110, 10)),
     las = 2)
minor.tick(nx = 4, ny = 0, tick.ratio =.5)

legend('topleft',
       legend = c('Training', 'Validation'),
       lty = c(1,2),
       lwd = 2,
       col = c('black', 'blue'),
       bty = 'n')


i): Run this method on the data. Request the forecasts on the validation period.(Note that the forecasted values for the validation set will be different than what the book shows.)
HW_DSS <- hw(train_ts, seasonal = 'multiplicative')
HW_pred <- forecast(HW_DSS, h = 4)

DSS_ts4 <- ts(DSS_ts, frequency = 4)

par(mar = c(5,5,3,2))

plot(DSS_ts4, type = 'l', lwd = 1,
     ylab = 'Sales (Thousands of $)', xlab = 'Year',
     main = 'Holter-Winter\'s Multiplicative Forecast',
     bty ='l', xlim = c(1,7), ylim = c(40000, 110000),
     axes = FALSE)

axis(1, 
     at = 1:7, labels = format(1:7))
axis(2, 
     at = seq(40000, 110000, 10000), labels = format(seq(40, 110, 10)),
     las = 2)
minor.tick(nx = 4, ny = 0, tick.ratio =.5)

lines(HW_pred$mean, lwd = 2, lty = 2, col = 'blue')
lines(HW_pred$upper[,2], lwd = 2, lty = 3, col = 'blue')
lines(HW_pred$lower[,2], lwd = 2, lty = 3, col = 'blue')

legend('topleft',
       legend = c('Timeseries Data', 
                  'Holt-Winter Forecast',
                  'Prediction Interval Bounds'),
       lty = c(1,2,3),
       lwd = c(1,2,2),
       col = c('black', 'blue', 'blue'),
       bty = 'n')

library(pander)

x <- data.frame(Quarter = c(21,22,23,24),
                Actual = as.numeric(valid_ts),
                Forecast = HW_pred$mean,
                Error = as.numeric(valid_ts) - HW_pred$mean)

pandoc.table(x, digits = 7)
Quarter Actual Forecast Error
21 60800 61334.90 -534.8988
22 64900 64971.30 -71.3030
23 76997 76718.11 278.8937
24 103337 99420.55 3916.4482


ii): Using the forecasts for the validation set that you came up with in i. above, compute the MAPE values for the forecasts of quarters 21-22.
y <- as.numeric(valid_ts)
MAPE <- sum( abs(y[1:2] - HW_pred$mean[1:2]) / y[1:2] ) * 100

print(paste('The MAPE for Quarters 21 and 22 is ', MAPE, '%', sep = ''))
## [1] "The MAPE for Quarters 21 and 22 is 0.989633784021131%"


C): The fit and the residuals were displayed in the book. Please reproduce them with R code. Using all the information from (b) and your generated figures, is this model suitable for forecasting quarters 21 and 22?

par(mar = c(5,5,3,2), mfrow = c(2,1))

plot(train_ts, type = 'l', lwd = 1,
     ylab = 'Sales (Thousands of $)', xlab = 'Year',
     main = 'Holter-Winter\'s Multiplicative Model Fit',
     bty ='l', xlim = c(1,6), ylim = c(40000, 110000),
     axes = FALSE)

axis(1, 
     at = 1:6, labels = format(1:6))
axis(2, 
     at = seq(40000, 110000, 10000), labels = format(seq(40, 110, 10)),
     las = 2)
minor.tick(nx = 4, ny = 0, tick.ratio =.5)

lines(HW_pred$fitted, lwd = 2, lty = 2, col = 'blue')

legend('topleft',
       legend = c('Timeseries Data', 
                  'Holt-Winter Forecast'),
       lty = c(1,2),
       lwd = c(1,2),
       col = c('black', 'blue'),
       bty = 'n')

# The "residuals" returned with the forecast object are actually the percentage errors, not the raw residuals. To get the raw residuals (i.e. forecast errors), I subtracted the fitted values from the original training timeseries

plot((HW_pred$x - HW_pred$fitted), type = 'l', lwd = 1,
     ylab = 'Residual', xlab = 'Year',
     main = 'HW Forecast Errors (Training Data)',
     bty = 'l', xlim = c(1,6), xaxt = 'n')

axis(1, 
     at = 1:6, labels = format(1:6))

minor.tick(nx = 4, ny = 0, tick.ratio =.5)

abline(h = 0, lty = 3)

legend('topleft',
       legend = 'Reference Line: Residual = 0',
       lty = 3,
       bty = 'n')

The model appears to fit the training data well, capturing the seasonality and trend. The forecast errors in the training period are not extremely different from those in the validation period, suggesting that the model does not overfit the training data. Additionally, the MAPE for quarters 21 and 22 is quite small (approximately 1%). With all this in mind, I would say that the model is suitable for forecasting quarters 21 and 22.


D): Another analyst decided to take a much simpler approach, and instead of using exponential smoothing he used differencing. Use differencing to remove the trend and seasonal pattern. Which order works better: first removing trend and then seasonality or the the opposite order? Show a the progression of time plots as you difference the data and each final series to provide evidence in support of your answer.

par(mar = c(5,5,3,2), mfrow = c(2,2))

plot(diff(DSS_ts4, 1),
     main = 'Lag-1 Difference',
     xlab = 'Year',
     ylab = 'Difference in Sales')

minor.tick(nx = 4, ny = 0, tick.ratio =.5)

plot(diff(DSS_ts4, 4),
     main = 'Lag-4 Difference',
     xlab = 'Year',
     ylab = 'Difference in Sales')

minor.tick(nx = 4, ny = 0, tick.ratio =.5)

plot(diff(diff(DSS_ts4, 1), 4),
     main = 'Twice Differenced (1 then 4)',
     xlab = 'Year',
     ylab = 'Difference in Sales')

minor.tick(nx = 4, ny = 0, tick.ratio =.5)

plot(diff(diff(DSS_ts4, 4), 1),
     main = 'Twice Differenced (4 then 1)',
     xlab = 'Year',
     ylab = 'Difference in Sales')

minor.tick(nx = 4, ny = 0, tick.ratio =.5)

From the above plots it is clear that the order in which you difference the time series has no effect. This can be further shown by use of the “identical” function in R. This function returns “TRUE” if the two objects passed to it are the same and “FALSE” otherwise.

identical(diff(diff(DSS_ts4, 1), 4), diff(diff(DSS_ts4, 4), 1))
## [1] TRUE


E): Forecast quarters 21-22 using the average of the double-differenced series from (d). Remember to use only the training period (until quarter 20), and to adjust back for the trend and seasonal pattern.

DoubleDiff <- diff(diff(train_ts, 1), 4)
DD_Forecast <- meanf(DoubleDiff, h = 2)

pointForecasts <- DD_Forecast
validLength <- 2
nTrain <- 20
nValid <- 4

realForecasts <- vector()

# I slightly altered the code from Prof. Dean's notes. The code 
# from the notes works when the length of the validation period 
# is equal to the number of seasons. The following code works 
# even when the length of the validation period is different (in 
# this case shorter) than the frequency of the time series.

diffinv.forecast <- function(diffForecasts, traindata, freq, horizon){
     
     realForecasts <- vector()
     nTrain <- length(traindata)
     
     for (i in 1:horizon) {
          if(i == 1) {
               realForecasts[i] <- diffForecasts[i] + 
                                   traindata[(nTrain+i)-freq] + 
                                   (traindata[nTrain] - traindata[nTrain - freq])
          } else {
               realForecasts[i] <- diffForecasts[i] + 
                                   traindata[(nTrain+i)-freq] + 
                                   (realForecasts[i-1] - traindata[nTrain+i-1-freq])
          }
     }
     
     return(realForecasts)
}

AdjDD_Forcasts <-  diffinv.forecast(DD_Forecast$mean, 
                                    traindata = train_ts, 
                                    freq = 4, 
                                    horizon = 2)

print(paste("The forecasts for quarters 21 and 22 using double-difference timeseries are ",
            AdjDD_Forcasts[1], ' dollars and ', AdjDD_Forcasts[2], ' dollars, respectively.',
            sep = ''))
## [1] "The forecasts for quarters 21 and 22 using double-difference timeseries are 63982.2 dollars and 68177.4 dollars, respectively."


F): Compare the forecasts from (e) to the exponential smoothing forecasts found in (b). Which of the two forecasting methods would you choose? Explain.

MAPE_Differenced <- sum( abs(y[1:2] - AdjDD_Forcasts) / y[1:2] ) * 100

x <- data.frame(Method = c('Holt-Winter\'s', 'Differencing'),
                MAPE = c(MAPE, MAPE_Differenced))

pandoc.table(x)
Method MAPE
Holt-Winter’s 0.9896
Differencing 10.28

I would use the Holt-Winter’s (HW) method. The MAPE value for quarters 21 and 22 is much smaller for the HW method than for the double-differenced method.


G): What is an even simpler approach that should be compared as a baseline? Complete that comparison.

Yes, an even simpler approach would be a naive forecast, in this case a seasonal naive forecast.

SN_DSS <- snaive(train_ts, h = 2)

SN_DSSfc <- as.numeric(SN_DSS$mean)

MAPE_SN <- sum( abs( y[1:2] - SN_DSSfc ) / y[1:2] ) * 100

x <- data.frame(Actual = c(y[1:2], 0),
                Naive = c(SN_DSSfc, MAPE_SN),
                Differencing = c(AdjDD_Forcasts, MAPE_Differenced),
                Holt_Winter = c(HW_pred$mean[1:2], MAPE))

rownames(x) <- c('Quarter 21', 'Quarter 22', 'MAPE')

pandoc.table(x)
  Actual Naive Differencing Holt_Winter
Quarter 21 60800 56405 63982 61335
Quarter 22 64900 60031 68177 64971
MAPE 0 14.73 10.28 0.9896

The Holt-Winter forecast has by far the lowest MAPE. The MAPE of the seasonal naive baseline is more than 14 times as large as the MAPE for the Holt-Winter’s forecast.



Question 8 Forecasting Australian Wine Sales

The figure below shows time plots of monthly sales of six types of Australian wines (red, rose, sweet white, dry white, sparkling, and fortified) for 1980-1994. The units are thousands of liters. You are hired to obtain short-term forecasts (2-3 months ahead) for each of the six series, and this task will be repeated every month.

AusWine <- read_csv('data/AustralianWines.csv')
AusWine <- AusWine[1:(nrow(AusWine) - 8),]

Fort_ts <- ts(AusWine$Fortified, start = c(1980,1), frequency = 12)
Red_ts <- ts(AusWine$Red, start = c(1980,1), frequency = 12)
Rose_ts <- ts(AusWine$Rose, start = c(1980, 1), frequency = 12)
Spark_ts <- ts(AusWine$sparkling, start = c(1980, 1), frequency = 12)
SwtW_ts <- ts(AusWine$`Sweet white`, start = c(1980, 1), frequency = 12)
DryW_ts <- ts(AusWine$`Dry white`, start = c(1980, 1), frequency = 12)

par(mar = c(5,5,3,2), mfrow = c(3,2))

plot(Fort_ts,
     main = 'Fortified Wine Sales',
     ylab = 'Thousands of Liters',
     xlab = 'Time',
     bty = 'l')

plot(Red_ts,
     main = 'Red Wine Sales',
     ylab = 'Thousands of Liters',
     xlab = 'Time',
     bty = 'l')

plot(Rose_ts,
     main = 'Rose Wine Sales',
     ylab = 'Thousands of Liters',
     xlab = 'Time',
     bty = 'l')

plot(Spark_ts,
     main = 'Sparkling Wine Sales',
     ylab = 'Thousands of Liters',
     xlab = 'Time',
     bty = 'l')

plot(SwtW_ts,
     main = 'Sweet White Wine Sales',
     ylab = 'Thousands of Liters',
     xlab = 'Time',
     bty = 'l')

plot(DryW_ts,
     main = 'Dry White Wine Sales',
     ylab = 'Thousands of Liters',
     xlab = 'Time',
     bty = 'l')


A): Which smoothing method would you choose if you had to choose the same method for forecasting all the series? Why?

Since all the time series show seasonality I would use Holt-Winter’s method.


B): Fortified wine has the largest market share of the six types of wine. You are asked to focus on fortified wine sales alone and produce as accurate a forecast as possible for the next two months.

  • Start by partitioning the data using the period until December 1993 as the training period.
  • Apply Holt-Winter’s exponential smoothing (with multiplicative seasonality) to sales
Fort_Train <- window(Fort_ts, end = c(1993, 12))

HW_fort <- hw(Fort_Train, seasonal = 'multiplicative')

HW_fort_fc <- forecast(HW_fort, h = 12)

autoplot(Fort_ts) + 
    autolayer(HW_fort$fitted, series = 'Fitted Values (Training)') +
    autolayer(HW_fort_fc$mean, series = 'Forecast (Validation)') +
    labs(y = 'Fortified Wine Sales (Thousands of Liters)')


C) Create a plot for the residuals from Holt-Winter’s exponential smoothing

checkresiduals(HW_fort)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' multiplicative method
## Q* = 39.543, df = 8, p-value = 3.897e-06
## 
## Model df: 16.   Total lags used: 24


From the residual plot, the ACF plot of the residuals, and the Ljung-Box test we can see that the residuals are not white noise and in fact show some of the monthly seasonality present in the raw series.

i): Based on this plot, which of the following statements are reasonable?
Decembers (month 12) are not captured well by the model

This statement is not unreasonable. The residuals for Decembers are higher than the other months. The plot below shows the absolute percent error for the model (on training data) broken down by month. The percent errors are generally higher for December than other months, reaching 18% to 19% in some cases.

Absolute_Percent_Error <- abs(HW_fort$residuals) * 100
ggsubseriesplot(Absolute_Percent_Error)

There is a strong correlation between sales on the same calendar month

Yes, this is clearly evident from the ACF plot of the residuals.

The model does not capture the seasonality well

Since there is correlation between residuals of the same month it does appear that the model has not fully captured the seasonality.

We should first deseasonalize the data and then apply Holt-Winter’s exponential smoothing

This doesn’t make a lot of sense since Holt-Winter’s exponential smoothing is for series with both seasonality and trend. If we were to deseasonalize the series first, it would make more sense to use Holt’s linear trend model (i.e. double exponential smoothing).


ii): How can you handle the above effect with exponential smoothing?

We could double-difference the series to remove both trend and seasonality. We could then use exponential smoothing to forecast the twice-differenced data.