The moving average is a simple smoother; it consists of averaging values across a window of consecutive periods, thereby generating a series of averages. A moving average with window width (W) means averaging across each set of w consecutive values, where w is determined by the user. For visualization, wider windows will expose more global trends, while narrow windows will reveal local trend (but may be subject to under-smoothing)
Simple exponential smoothing is similar to forecasting with a moving average, except that instead of taking a simple average over the w most recent values, we take a weighted average of all past values, so that the weights decrease exponentially into the past. In order to achieve an equivalent result using simple exponential smoothing, the smoothing constant needs to be closer to 1, since it indicates fast learning and only the most recent values influence the forecast. (Whereas a value close to 0 indicates slow learning and past observations have a large influence on forecast).
Moving average of raw series. Not suitable: Moving averages can generally only be used for series without trend and seasonality. This series has both.
Moving average of deseasonalized series. Not suitable for the same reason, although it could be used if the series was both deseasonalized and de-trended.
Simple exponential smoothing of the raw series. Not suitable: Like moving averages, this should only be used on series without trend and seasonality.
Double exponential smoothing of the raw series. Not suitable: Double exponential smoothing works on series with trend, but not seasonality.
Holt-Winter’s exponential smoothing of the raw series. Suitable, because it works on series with trend, seasonality and level over time.
library(forecast)
library(zoo)
library(dplyr)
library(timeDate)
DeptStoreSales<- read.csv("DeptStoreSales.csv", stringsAsFactors = FALSE)
str(DeptStoreSales)
## 'data.frame': 24 obs. of 2 variables:
## $ Quarter: int 1 2 3 4 5 6 7 8 9 10 ...
## $ Sales : int 50147 49325 57048 76781 48617 50898 58517 77691 50862 53028 ...
DeptStoreSales.TS <- ts(DeptStoreSales$Sales, start=c(1,1), frequency=4)
yrange = range(DeptStoreSales.TS)
ValidLength <- 4
TrainLength <-length(DeptStoreSales.TS) - ValidLength
DeptTrain <- window(DeptStoreSales.TS, start = c(1), end=c(1, TrainLength))
DeptValid <- window(DeptStoreSales.TS, start=c(1,TrainLength + 1), end=c(1, ValidLength + TrainLength))
HWSales <- hw(DeptTrain, seasonal="multiplicative", h=4)
summary(HWSales)
##
## Forecast method: Holt-Winters' multiplicative method
##
## Model Information:
## Holt-Winters' multiplicative method
##
## Call:
## hw(y = DeptTrain, h = 4, seasonal = "multiplicative")
##
## Smoothing parameters:
## alpha = 0.4032
## beta = 0.1429
## gamma = 0.4549
##
## Initial states:
## l = 57401.8119
## b = 605.4045
## s=1.3012 0.9795 0.8614 0.8579
##
## sigma: 0.0258
##
## AIC AICc BIC
## 372.3936 390.3936 381.3552
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 246.631 1499.632 977.6487 0.3530739 1.665686 0.3137009
## ACF1
## Training set -0.07882461
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 6 Q1 61334.90 59303.21 63366.58 58227.70 64442.09
## 6 Q2 64971.30 62529.36 67413.25 61236.67 68705.94
## 6 Q3 76718.11 73376.84 80059.37 71608.08 81828.13
## 6 Q4 99420.55 94372.29 104468.81 91699.90 107141.20
accuracy(HWSales, DeptValid, test = 1)
## ME RMSE MAE MPE MAPE MASE ACF1
## Test set -534.8988 534.8988 534.8988 -0.8797678 0.8797678 0.1716345 NA
## Theil's U
## Test set NaN
accuracy(HWSales, DeptValid, test = 2)
## ME RMSE MAE MPE MAPE MASE ACF1
## Test set -71.303 71.303 71.303 -0.1098659 0.1098659 0.02287919 NA
## Theil's U
## Test set 0.01739097
yrange = range(DeptStoreSales.TS - ValidLength)
plot(c(1,5.5), yrange, xlab="Year", ylab="Sales Dollars (in thousands)", ylim=c(50,90), bty="l", main="Exponential Smoothing: Actual Vs. Forecast (Training Data)")
lines(DeptStoreSales.TS/1000, bty="l", col = "blue", lwd=2)
lines(HWSales$fitted/1000, col="red", lwd=2)
legend(3,40, c("Actual", "Forecasted"), lty=c(1,1), col=c("blue", "red"), bty="n")
plot(HWSales$residuals, xlab = "Year", ylab = "Residual Values of Sales", bty = "l", xaxt = "n", yaxt = "n", lwd = 2, main = "Exponential Smoothing Forecast Errors (Training Data)", bty = "n")
axis(1,at=seq(1,7,1), labels = format(seq(1,7,1)), pos = 0)
axis(2,at=seq(-0.06,0.07, .01),labels = format(seq(-6000,7000,1000)),las = 2)
Based on the data above, the forecast is a good predictor of quorters 21 and 22. The measured MAPE values are small and the residuals are also fairly low. Model seems to be fitting the dataset quite well.
par(mfrow = c(2,2))
plot(DeptStoreSales.TS/1000, ylab = "Sales Dollars (in thousands)", xlab = "Year", bty = "l", main = "Department Store Sales")
plot(diff(DeptStoreSales.TS, lag=4), ylab = "Lag-4", xlab = "Year", bty = "l", main = "Lag-4 Difference")
plot(diff(DeptStoreSales.TS,lag=1), ylab = "Lag-1", xlab = "Year", bty = "l", main = "Lag-1 Difference")
Lag4Lag1<- diff(diff(DeptStoreSales.TS, lag=4), lag=1)
plot(Lag4Lag1, ylab = "Lag-4, then Lag-1", xlab = "Year", bty = "l", main = "Twice-Differenced (Lag-4, Lag-1)")
par(mfrow = c(2,2))
plot(DeptStoreSales.TS/1000, ylab = "Sales Dollas (in thousands)", xlab = "Year", bty = "l", main = "Dept Store Sales")
plot(diff(DeptStoreSales.TS, lag=1), ylab = "Lag-1", xlab = "Year", bty = "l", main = "Lag-1 Difference")
plot(diff(DeptStoreSales.TS, lag=4), ylab = "Lag-4", xlab = "Year", bty = "l", main = "Lag-4 Difference")
Lag1Lag4<- diff(diff(DeptStoreSales.TS, lag=1), lag=4)
plot(Lag1Lag4, ylab = "Lag-1, then Lag-4", xlab = "Year", bty = "l", main = "Twice-Differenced (Lag-1, Lag-4)")
It appears as though there is no difference if you deseasonalize first and then detrend or vice versa.
DDForecast <-meanf(diff(diff(DeptTrain, lag = 4), lag = 1), h = 2)
DDForecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 6 Q1 569.2 -2116.935 3255.335 -3714.114 4852.514
## 6 Q2 569.2 -2116.935 3255.335 -3714.114 4852.514
RealForecast <-vector()
for (i in 1:ValidLength) {
if(i == 1) {
RealForecast[i] <- DDForecast$mean[i] + DeptTrain[(TrainLength+i)-ValidLength] + (DeptTrain[TrainLength] - DeptTrain[TrainLength - ValidLength])
} else {
RealForecast[i] <- DDForecast$mean[i] + DeptTrain[(TrainLength+i)-ValidLength] + (RealForecast[i-1] - DeptTrain[TrainLength+i-1-ValidLength])
}
}
RealForecast
## [1] 63982.2 68177.4 NA NA
accuracy(RealForecast, DeptValid)
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set -3229.8 3230.151 3229.8 -5.141902 5.141902 -0.5 0.13634
Personally, I would pick the Holt-Winters’ model. The model is easier to use, took fewer steps to compute and understand. Additionally, MAPE score on the double-difference method is above 5%, whereas for the Holt-Winters’ model they are significantly lower.
An even simpler approach would be to use seasonal naive forecast method, as there is clear seasonality to data:
DeptSnaive <- snaive(DeptTrain,h=ValidLength)
DeptSnaive
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 6 Q1 56405 51434.14 61375.86 48802.72 64007.28
## 6 Q2 60031 55060.14 65001.86 52428.72 67633.28
## 6 Q3 71486 66515.14 76456.86 63883.72 79088.28
## 6 Q4 92183 87212.14 97153.86 84580.72 99785.28
plot(c(1,7),yrange,type="n",xlab="Year",ylab="Sales Dollars (in thousands)",bty="l",xaxt="n",yaxt="n", main = "Comparison of Models")
axis(1, at=seq(1,7,1), labels=format(seq(1,7,1)))
axis(2, at=seq(0,110000,10000), labels=format(seq(0,110,10)), las=2)
lines(DeptStoreSales.TS)
lines(DeptSnaive$mean, col="green", lwd=2)
lines(HWSales$mean,lwd=2,col="red")
legend(2,100000, c("Actual","Holt-Winter's", "Seasonal Naive Forecast"),lwd=2, col=c("black", "red","green"), bty="n")
accuracy(DeptSnaive, DeptValid)
## ME RMSE MAE MPE MAPE MASE
## Training set 2925.25 3878.784 3116.50 4.344358 4.737739 1.000000
## Test set 6482.25 7032.176 6482.25 8.170540 8.170540 2.079978
## ACF1 Theil's U
## Training set 0.54855812 NA
## Test set 0.01334402 0.4705349
Despite from being easier, as compared to the Holt-Winter’s method, seasonal naive forecasting method is not ideal for this situation. As evidenced from the chart above, seasonal naive method significantly underforecasts sales. Additionally, the MAPE of the Naive forecast is over 8% for the test set, which is not as good as in the Holt-Winter’s method performed.
All of the 6wine types have some seasonality to them, and most have some trend. Because of that, I will need to use a method that can accomodate both of these conditions, making me select the Holt-Winter’s method
WineSales <- read.csv("AustralianWines.csv", stringsAsFactors = FALSE)
WineSales <- na.omit(WineSales)
WineSales.TS <- ts(WineSales$Fortified, start = c(1980, 1), frequency = 12)
WValidLength <- 12
WTrainLength <- length(WineSales.TS) - WValidLength
WSalesTrain <- window(WineSales.TS,end = c(1980, WTrainLength))
WSalesValid <- window(WineSales.TS, start = c(1980, WTrainLength + 1), end = c(1980, WTrainLength + WValidLength))
HWWine<- hw(WSalesTrain, seasonal="multiplicative", h=12)
HWWine
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1994 1326.700 1178.965 1474.436 1100.758 1552.642
## Feb 1994 1566.896 1392.151 1741.641 1299.647 1834.145
## Mar 1994 1857.505 1650.032 2064.979 1540.202 2174.809
## Apr 1994 2003.232 1779.128 2227.335 1660.494 2345.969
## May 1994 2396.889 2128.312 2665.466 1986.136 2807.642
## Jun 1994 2426.159 2153.851 2698.467 2009.700 2842.618
## Jul 1994 2957.688 2625.158 3290.219 2449.127 3466.249
## Aug 1994 2736.162 2428.002 3044.322 2264.872 3207.452
## Sep 1994 2025.046 1796.568 2253.523 1675.620 2374.472
## Oct 1994 1879.638 1667.178 2092.097 1554.709 2204.566
## Nov 1994 2164.681 1919.545 2409.818 1789.778 2539.585
## Dec 1994 2276.860 2018.526 2535.193 1881.772 2671.947
accuracy(HWWine, WSalesValid)
## ME RMSE MAE MPE MAPE MASE
## Training set -22.06217 280.3027 221.7110 -1.459040 7.260039 0.7967865
## Test set 115.04537 337.3427 265.2853 3.739657 11.246490 0.9533845
## ACF1 Theil's U
## Training set 0.03407816 NA
## Test set 0.01658673 0.7213012
HWWine$mean[1:2]
## [1] 1326.700 1566.896
plot(HWWine, ylab = "Sales (in thousands)")
plot(HWWine$residuals, xlab = "Year", ylab = "Residuals (in thousands of liters)", bty = "l", xaxt = "n", yaxt = "n", lwd = 2, main = "Residuals of Fortified Wine Sales", bty = "n", col = "green")
axis(1,at=seq(1980,1994,1), labels = format(seq(1980,1994,1)))
axis(2,at=seq(-.5,.5,.05),labels = format(seq(-5000,5000,500)),las = 2)
checkresiduals(HWWine)
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' multiplicative method
## Q* = 39.543, df = 8, p-value = 3.897e-06
##
## Model df: 16. Total lags used: 24
Decembers (month 12) are not captured well by the model: Reasonable. Residuals tend to be very high in December months.
There is a strong correlation between sales on the same calendar month. Somewhat Reasonable. The actual and forecast models show the seasonality better than can be explained by the residual plot, but residuals are close to zero in some periods (for example, in the third quorter), which indicates the correlation by calendar month.
The model does not capture the seasonality well. Unreasonable. This model is designed to capture seasonality. Additionally, the residuals are off in opposite directions in Decembers of each year indicating that sales can vary is either direction, and seasonality is not the reason for that.
We should first deseasonalize the data and then apply Holt-Winter’s exponential smoothing. Unreasonable. This method is designed to handle seasonality as well as trend.
While we are looking at monthly seasonality, it may be a good idea to also look at the annual seasonality. A Double-seasonal Holt-Winter’s model may be helpful with this, especially because of more than one seasonal pattern in a series as evidenced by the December numbers.