Chapter 5 Problem 5
We will bring in the data and want to decompose it to get the trend and seasonality to see which methods are viable. We know that the data is quarterly, and to use the decompose function we need a frequency above 1. We will set it to 4.
library(forecast)
## Warning: package 'forecast' was built under R version 3.4.3
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.3
#Bring in data
dep <- read.csv("DeptStoreSales.csv", stringsAsFactors = FALSE)
#Create time series
depTS <- ts(dep$Sales, frequency=4)
#Break down into components and plot
depTScomponent <- decompose(depTS)
autoplot(depTScomponent)

(a) Which of the following methods would not be suitable for forecasting this series. Explain why or why not for each one.
*Moving average of raw series- No because the series has seasonality and trend
*Moving average of deseasonalized series- No because the series has a trend
*Simple exponential smoothing of the raw series- No because the series has trend and seasonality
*Double exponential smoothing of the raw series- No because the trend is not linear.
*Holt-Winter’s exponential smoothing of the raw series- yes this will work with the trend and seasonality.
b. A forecaster was tasked to generate forecasts for 4 quarters ahead. She therefore partitioned the data so that the last 4 quarters were designated as the validation period. The foreccaster approached the forecasting task by using multiplicative Holt-Winter’s exponential smoothing. Specifically, you should call the hw function with the parameter seasonal=“multiplicative”. Let the method pick the appropriate parameters for ??, ??, and ??
i. Run this method on the data. Request the forecasts on the validation period.(Note that the forecasted values for the validation set will be different than what the book shows.)
validLength <- 4
trainLength <- length(depTS) - validLength
depTrain <- window(depTS, end=c(1, trainLength))
depValid <- window(depTS, start=c(1,trainLength+1))
depHWMul <- hw(depTrain, seasonal="multiplicative", h=4)
summary(depHWMul)
##
## Forecast method: Holt-Winters' multiplicative method
##
## Model Information:
## Holt-Winters' multiplicative method
##
## Call:
## hw(y = depTrain, h = 4, seasonal = "multiplicative")
##
## Smoothing parameters:
## alpha = 0.4032
## beta = 0.1429
## gamma = 0.4549
##
## Initial states:
## l = 57401.8119
## b = 605.4045
## s=1.3012 0.9795 0.8614 0.8579
##
## sigma: 0.0258
##
## AIC AICc BIC
## 372.3936 390.3936 381.3552
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 246.631 1499.632 977.6487 0.3530739 1.665686 0.3137009
## ACF1
## Training set -0.07882461
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 6 Q1 61334.90 59303.21 63366.58 58227.70 64442.09
## 6 Q2 64971.30 62529.36 67413.25 61236.67 68705.94
## 6 Q3 76718.11 73376.84 80059.37 71608.08 81828.13
## 6 Q4 99420.55 94372.29 104468.81 91699.90 107141.20
autoplot(depHWMul)

ii. Using the forecasts for the validation set that you came up with in i. above, compute the MAPE values for the forecasts of quarters 21-22.
Q21 <- window(depTS, start = c(1,21), end= c(1,21))
accuracy(depHWMul, Q21)
## ME RMSE MAE MPE MAPE MASE
## Training set 246.6310 1499.6322 977.6487 0.3530739 1.6656858 0.3137009
## Test set -534.8988 534.8988 534.8988 -0.8797678 0.8797678 0.1716345
## ACF1
## Training set -0.07882461
## Test set NA
Training set MAPE is 1.66 and the test on Q21 is even better at .88
Q22 <- window(depTS, start = c(1,22), end= c(1,22))
accuracy(depHWMul, Q22)
## ME RMSE MAE MPE MAPE MASE
## Training set 246.631 1499.632 977.6487 0.3530739 1.6656858 0.31370087
## Test set -71.303 71.303 71.3030 -0.1098659 0.1098659 0.02287919
## ACF1
## Training set -0.07882461
## Test set NA
Training set MAPE is again .166 and the test set is better at .11
(d) Another analyst decided to take a much simpler approach, and instead of using exponential smoothing he used differencing. Use differencing to remove the trend and seasonal pattern. Which order works better: first removing trend and then seasonality or the the opposite order? Show a the progression of time plots as you difference the data and each final series to provide evidence in support of your answer.
First we will deseasonalize before de-trending.
par(mfrow = c(2,2))
lag12 <- diff(depTrain, lag=12)
autoplot(lag12)

Then we de-trend
par(mfrow = c(2,2))
lag12ThenLag1 <- diff(lag12, lag=1)
autoplot(lag12ThenLag1)

Next we will do the opposite, de-trend then deseasonalize
par(mfrow = c(2,2))
lag1 <- diff(depTrain, lag=1)
autoplot(lag1)

par(mfrow = c(2,2))
lag1ThenLag12 <- diff(lag1, lag=12)
autoplot(lag1ThenLag12)

We can see that we end up with the same plots regaredless of the order that we perform the differencing.
(e) Forecast quarters 21-22 using the average of the double-differenced series from (d). Remember to use only the training period (until quarter 20), and to adjust back for the trend and seasonal pattern.
#Get the point forecast by averaging
pointForecasts <- meanf(diff(diff(depTrain, lag=12), lag=1), h=2)
#Convert bcak to original time series
realForecasts <- vector()
validLength1 <- 2
for (i in 1:validLength1) {
if(i == 1) {
realForecasts[i] <- pointForecasts$mean[i] + depTrain[(trainLength+i)-validLength1] + (depTrain[trainLength] - depTrain[trainLength - validLength1])
} else {
realForecasts[i] <- pointForecasts$mean[i] + depTrain[(trainLength+i)-validLength1] + (realForecasts[i-1] - depTrain[trainLength+i-1-validLength1])
}
}
# See what they look like
realForecasts
## [1] 105495 128049
(f) Compare the forecasts from (e) to the exponential smoothing forecasts found in (b). Which of the two forecasting methods would you choose? Explain.
We will check the accuracy of the new forecast.
accuracy(realForecasts, Q21)
## ME RMSE MAE MPE MAPE
## Test set -44695 44695 44695 -73.51151 73.51151
accuracy(realForecasts, Q22)
## ME RMSE MAE MPE MAPE
## Test set -40595 40595 40595 -62.55008 62.55008
Given that the MAPE for the validation period for the differenced method is far worse than the Holts Winter’s method, it appears the Holts Winter’s method is the better choice.
(g) What is an even simpler approach that should be compared as a baseline? Complete that comparison.
The naive forecast(or seasonal naive) is a simple baseline to compare to.
accuracy(snaive(depTrain), Q21)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 2925.25 3878.784 3116.5 4.344358 4.737739 1.000000 0.5485581
## Test set 4395.00 4395.000 4395.0 7.228618 7.228618 1.410236 NA
accuracy(snaive(depTrain), Q22)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 2925.25 3878.784 3116.5 4.344358 4.737739 1.00000 0.5485581
## Test set 4869.00 4869.000 4869.0 7.502311 7.502311 1.56233 NA
We can see here that the seasonal naive forecast is much better than the double differenced forecasted based on the MAPE, however the Holts Winter’s method still proves to be the best option.
Chapter 5 Problem 8
(a) Which smoothing method would you choose if you have to choose the same method for forecasting all series? Why?
Looking at all 6 plots, all of them have some sort of seasonality, but not all have trends. Because of this, i would choose differencing the double exponential smoothing method since it can handle trend and/or seasonality.
(b) Fortified wine has the largest market share of the six types of wine. You are asked to focus on fortified wine sales alone and produce as accuarte a forecast as possible for the next two months,
*Partition the data using the period until Dec-1993 as the training period.
#Bring in data
aw <- read.csv("AustralianWines.csv", stringsAsFactors = FALSE)
#Create time series
fortTS <- ts(aw$Fortified, start=c(1980, 1), frequency=12)
autoplot(fortTS)

#Create validation and training periods.
fortTrain <- window(fortTS, end=c(1993, 12))
fortValid <- window(fortTS, start=c(1994,1))
*Apply Holt-Winter’s exponential smoothing (with multiplicative seasonality) to sales.
fortHWMul <- hw(fortTrain, seasonal="multiplicative", h=12)
summary(fortHWMul)
##
## Forecast method: Holt-Winters' multiplicative method
##
## Model Information:
## Holt-Winters' multiplicative method
##
## Call:
## hw(y = fortTrain, h = 12, seasonal = "multiplicative")
##
## Smoothing parameters:
## alpha = 0.0539
## beta = 5e-04
## gamma = 1e-04
##
## Initial states:
## l = 3977.0021
## b = -7.7979
## s=1.0924 1.0337 0.8934 0.958 1.2884 1.3862
## 1.1318 1.113 0.9259 0.8547 0.7177 0.6049
##
## sigma: 0.0869
##
## AIC AICc BIC
## 2759.531 2763.611 2812.638
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -22.06217 280.3027 221.711 -1.45904 7.260039 0.7967865
## ACF1
## Training set 0.03407816
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1994 1326.700 1178.965 1474.436 1100.758 1552.642
## Feb 1994 1566.896 1392.151 1741.641 1299.647 1834.145
## Mar 1994 1857.505 1650.032 2064.979 1540.202 2174.809
## Apr 1994 2003.232 1779.128 2227.335 1660.494 2345.969
## May 1994 2396.889 2128.312 2665.466 1986.136 2807.642
## Jun 1994 2426.159 2153.851 2698.467 2009.700 2842.618
## Jul 1994 2957.688 2625.158 3290.219 2449.127 3466.249
## Aug 1994 2736.162 2428.002 3044.322 2264.872 3207.452
## Sep 1994 2025.046 1796.568 2253.523 1675.620 2374.472
## Oct 1994 1879.638 1667.178 2092.097 1554.709 2204.566
## Nov 1994 2164.681 1919.545 2409.818 1789.778 2539.585
## Dec 1994 2276.860 2018.526 2535.193 1881.772 2671.947
We will check on the accuracy compared to the validation period.
accuracy(fortHWMul, fortValid)
## ME RMSE MAE MPE MAPE MASE
## Training set -22.06217 280.3027 221.7110 -1.459040 7.260039 0.7967865
## Test set 115.04537 337.3427 265.2853 3.739657 11.246490 0.9533845
## ACF1 Theil's U
## Training set 0.03407816 NA
## Test set 0.01658673 0.7213012
(c) Create a plot for the residuals from the Holt-Winter’s exponential smoothing
checkresiduals(fortHWMul)

##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' multiplicative method
## Q* = 39.543, df = 8, p-value = 3.897e-06
##
## Model df: 16. Total lags used: 24
We will also do a seasonal plot of the residuals
ggseasonplot(resid(fortHWMul))

i. Based on the plot, which of the following statements are reasonable?
*Decembers are not captured well by the model- This is true as the seasonal plot shows that the residuals in December are all either well above or below 0.
*There is a strong correlation between sales in the same calendar month- Yes, looking at the lags that are multiples of 12, there is a consistent pattern with ACFs during those lag numbers.
*The model does not capture the seasonality well- the ACF plot shows that this is not a stationary process and not white noise, meaning seasonality is not captured well.
*We should first deseasonalize the data and then apply Holt-Winter’s exponential smoothing- The Holt-Winter’s method is designed to work with series that contain both trend and seasonality, so deseasonalizing doesn’t seem neccesary.
ii. How can you handle the above effect with exponential smoothing?
autoplot(decompose(fortTS))

Looking at the components of the original series, it has trend and seasonality. You could de-trend and deseasonalize the data by differencing twice and then use the simple exponential smoothing method.