Question 2: To get an equivalent result to the moving average, using exponential smoothing you must select a smoothing constant that places the same importance on fresh information. The smoothing constant can be calculated as w = 2/\(\alpha\) - 1. For example if the window width is 1, the smoothing constant (\(\alpha\)) would be 3.
Question 5: a. Which of the following methods would not be suitable for forecasting this series. Explain why or why not for each one. * Moving average of raw series: This would not be suitable, as you should only use a moving average for forecasting a series with no trend or seasonality. * Moving average of deseasonalized series: This would be appropriate to use, as it removes the seasonality before you utilize the moving average. * Simple exponential smoothing of the raw series: This should not be used, for the same reason as the moving average of the raw series. Simple exponential smoothing is for forecasting a series without any trend or seasonality. * Double exponential smoothing of the raw series: This should not be used as double exponential smoothing only works to remove trend. This data set incorporates seasonality, so we would need to address the seasonality before using a simple or double exponential smoothing method. * Holt-Winter’s exponential smoothing of the raw series: This addresses trend and seasonality so would be suitable to use in forecasting this series.
A forecaster was tasked to generate forecasts for 4 quarters ahead. She therefore partitioned the data so that the last 4 quarters were designated as the validation period. The foreccaster approached the forecasting task by using multiplicative Holt-Winter’s exponential smoothing. Specifically, you should call the ets function with the parameters restrict=FALSE, model = “ZZZ” and use the smoothing constants of ??=0.2??=0.2, ??=0.15??=0.15, and ??=0.05??=0.05.
Run this method on the data. Request the forecasts on the validation period.(Note that the forecasted values for the validation set will be different than what the book shows.)
DeptSales <- read.csv("~/DeptStoreSales.csv", stringsAsFactors = FALSE)
salesTS <- ts(DeptSales$Sales, start=c(1), frequency = 1)
yrange = range(salesTS)
xrange = range(DeptSales$Quarter)
plot(c(1,24), yrange, type="n", xlab="Quarter", ylab="Sales", bty="l", xaxt="n", yaxt="n")
lines(salesTS, bty="l")
axis(1, at=seq(1,24,1), labels=format(seq(1,24,1)))
axis(2, at=seq(30000,90000, 120000), labels=format(seq(30000,90000, 120000)), las=2)
library(forecast)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: timeDate
## This is forecast 7.3
validLength <- 4
trainLength <- length(salesTS) - validLength
salesTrain <- window(salesTS, end=c(1, trainLength))
salesValid <- window(salesTS, start=c(1,trainLength+1))
ETSDeptSales <- ets(salesTrain, restrict=FALSE, model = "ZZZ", alpha=0.2, beta= 0.15, gamma = 0.05)
ETSDeptSales
## ETS(M,N,N)
##
## Call:
## ets(y = salesTrain, model = "ZZZ", alpha = 0.2, beta = 0.15,
##
## Call:
## gamma = 0.05, restrict = FALSE)
##
## Smoothing parameters:
## alpha = 0.2
##
## Initial states:
## l = 56082.9095
##
## sigma: 0.2248
##
## AIC AICc BIC
## 443.4999 444.2058 445.4914
ValidPredictSales <- forecast(ETSDeptSales, h=validLength, level = 0)
ValidPredictSales
## Point Forecast Lo 0 Hi 0
## 21 70043.77 70043.77 70043.77
## 22 70043.77 70043.77 70043.77
## 23 70043.77 70043.77 70043.77
## 24 70043.77 70043.77 70043.77
a <- c(21, 22, 23, 24)
salesforecast <- data.frame(a, ValidPredictSales)
plot(salesTS, ylab="Sales", xlab="Quarter", bty="l", main="Department Store Sales by Quarter")
lines(salesforecast, col="red", lwd=2, lty=2)
accuracy(ValidPredictSales, salesValid)
## ME RMSE MAE MPE MAPE MASE
## Training set 3490.214 13300.60 10649.53 2.054374 15.85521 0.7571339
## Test set 6464.733 17809.35 13658.50 4.529848 16.09447 0.9710585
## ACF1 Theil's U
## Training set -0.1103099 NA
## Test set 0.1722954 1.146863
I used the forecast from my previous answer to create the tables showing the fitted forecast and the forecast errors. These looked very different from those shown in the book due to the differences in part (b) provided by the updated question 5 instructions.
plot(c(1,20), yrange, type="n", xlab="Quarter", ylab="Sales", bty="l", xaxt="n", yaxt="n")
lines(salesTS, bty="l")
axis(1, at=seq(1,24,1), labels=format(seq(1,24,1)))
axis(2, at=seq(30000,90000, 120000), labels=format(seq(30000,90000, 120000)), las=2)
lines(ValidPredictSales$fitted, col="blue")
I was unsuccessful in my attempts to plot the forecast error over time for quarters 21-24, so I used a basic workaroun by creating a simple table with the data and generating a plot from that.
21 60800 70043.77 -9243.77 22 64900 70043.77 -5143.77 23 76997 70043.77 6953.23 24 103337 70043.77 33293.23
quarter = c(21, 22, 23, 24)
error = c(-9243.77,-5143.77, 6953.23, 33293.23)
errorCalc <- data.frame(quarter, error)
errorTS <- ts(errorCalc$error, start=c(21), frequency = 1)
plot(errorTS)
To begin I tried removing trend first, then seasonality.
par(mfrow = c(2,2))
plot(salesTS, ylab="Sales", xlab="Quarter", bty="l", main="Department Store Sales by Quarter")
plot(diff(salesTS, lag=1), ylab="Lag-1", xlab="Quarter", bty="l", main="Lag-1 (De-Trend) Difference")
detrenddeseason <- diff(diff(salesTS, lag=1), lag=4)
plot(detrenddeseason, ylab="Lag-1, then Lag-4", xlab="Quarter", bty="l", main="De-Trend, then Deseasonalized")
pointForecasts <- meanf(diff(diff(salesTrain, lag=1), lag=4), h=4)
plot(diff(salesTS, lag=4), ylab="Lag-4", xlab="Quarter", bty="l", main="Lag-4 (De-Seasonality) Difference")
par(mfrow = c(1,1))
deseasondetrend <- diff(diff(salesTS, lag=4), lag=1)
plot(deseasondetrend, ylab="Lag-4, then Lag-1", xlab="Quarter", bty="l", main="Deseasonalize, then De-Trend")
pointForecastslag4 <- meanf(diff(diff(salesTrain, lag=4), lag=1), h=4)
Based on the plots above, it seems there is no impact when you switch the order of de-trending and deseasonalizing. This seems unlikely, but I am not sure how else to calculate the lag-4, then lag-1 and lag-1, then lag-4 other than I did above - where they both yielded the same results.
pointForecasts <- meanf(diff(diff(salesTrain, lag=1), lag=4), h=4)
realForecasts <- vector()
for (i in 1:validLength) {
if(i == 1) {
realForecasts[i] <- pointForecasts$mean[i] + salesTrain[(trainLength+i)-validLength] + (salesTrain[trainLength] - salesTrain[trainLength - validLength])
} else {
realForecasts[i] <- pointForecasts$mean[i] + salesTrain[(trainLength+i)-validLength] + (realForecasts[i-1] - salesTrain[trainLength+i-1-validLength])
}
}
par(mfrow = c(1,1))
a <- c(21, 22, 23, 24)
realForecasts
## [1] 63982.2 68177.4 80201.6 101467.8
forecastplot <- data.frame(a, realForecasts)
plot(salesTS, ylab="Sales", xlab="Quarter", bty="l", main="Department Store Sales by Quarter")
lines(forecastplot, col="red", lwd=2, lty=2)
Looking at the two plots, it seems clear that the forecasted values from e are much more accurate to the actual values in quarters 21-24. The process was also simpler, making it easier to utilize. The accuracy combined with the simplicity would drive me to choose the difference done in (e).
A simple moving average could also be compared as a baseline.
maverage <- ma(salesTrain, order = 4)
maveragetrain <- forecast(maverage, h=validLength, level = 0)
a <- c(19, 20, 21, 22)
maveplot <- data.frame(a, maveragetrain)
plot(salesTS, ylab="Sales", xlab="Quarter", bty="l", main="Department Store Sales by Quarter")
lines(maverage, col="blue", lwd=2, lty=2)
lines(maveplot, col="red", lwd=2, lty=2)
Question 8: (a) I would use differencing if I had to select one method for all of the series of wine data. Some clearly have trend and seasonality, but others seem to have just one or the other. A simple method, such as differencing, that I can apply de-trending or deseasonalizing or both will allow flexibility. Also, as I saw in question 5, more complex doesn’t always mean better.
Wine <- read.csv("~/AustralianWines.csv", stringsAsFactors = FALSE)
wineTS <- ts(Wine$Fortified, start = c(1980, 1), freq = 12)
yrange = range(wineTS)
xrange = range(Wine$Month)
plot(wineTS)
library(forecast)
validLength <- 12
trainLength <- length(wineTS) - validLength
wineTrain <- window(wineTS, end=c(1980, trainLength))
wineValid <- window(wineTS, start=c(1980,trainLength+1))
Once I broke out the first 13 years as the training period and the last twelve months as the validation period, I then applied Holt-Winter’s exponential smoothing.
wineHW <- HoltWinters(wineTrain, seasonal="mult")
wineforecast <- predict(wineHW, n.ahead=12)
plot(wineTS)
lines(wineforecast, col="red", lwd=2, lty=2)
abline(v=1994)
arrows(1980, 5500, 1994, 5500, code=3, length=0.1)
text(1987, 4900, "Training")
abline(v=1994)
arrows(1994, 5500, 1995, 5500, code=3, length=0.1)
text(1994.2, 4900, "Validation")
The following states are reasonable: * The model does not capture the seasonality well. * There is a strong correlation between sales on the same calendar month.