Ch.9: Q.1-3
Q.1
Would you consider neural networks for this task?
Yes I would consider neural networks for this task. As mentioned in the book neural networks have good performance for high frequency series. The data is given monthly which is relatively high frequency and the forecasts are also short-term (2-3 months ahead) so neural networks could yield desirable forecasts. There is also the option of combining a neural network method with other methods to obtain ensemble forecasts.
Q.2
Use neural networks to forecast fortified wine sales, as follows:
\(\bullet\) Partition the data using the period until December 1993 as the training period.
\(\bullet\) Run a neural network using R’s \(\texttt{nnetar}\) with 11 non-seasonal lags (i.e., \(p=11\)). Leave all other arguments at their default.
First plot raw data:
library(forecast)
winesales <- read.csv("AustralianWines.csv")
#str(winesales)
#head(winesales)
#tail(winesales)
winesalesTS <- ts(winesales$Fortified,start=c(1980,1),frequency=12)
yrange = range(winesalesTS)
plot(c(1980,1995),yrange,type="n",xlab="Year",ylab="Fortified Wine Sales (thousands of liters)",bty="l",xaxt="n",yaxt="n")
lines(winesalesTS,bty="l")
axis(1,at=seq(1980,1995,1),labels=format(seq(1980,1995,1)))
axis(2,at=seq(1000,6000,500),labels=format(seq(1000,6000,500)),las=2)
The validation set is the duration of year 1994 (12 months) and the training set is the first 14 years (168 months)
validLength <- 12
trainLength <- length(winesalesTS) - validLength
winesalesTrain <- window(winesalesTS, end=c(1980,trainLength))
winesalesValid <- window(winesalesTS, start=c(1980,trainLength+1))
A neural network with 11 non-seasonal lags is obtained to be
#set random number generator seed
set.seed(8373493)
winesalesNN <- nnetar(winesalesTrain,p=11)
winesalesNN
## Series: winesalesTrain
## Model: NNAR(11,1,6)[12]
## Call: nnetar(y = winesalesTrain, p = 11)
##
## Average of 20 networks, each of which is
## a 12-6-1 network with 85 weights
## options were - linear output units
##
## sigma^2 estimated as 5734
summary(winesalesNN$model[[1]])
## a 12-6-1 network with 85 weights
## options were - linear output units
## b->h1 i1->h1 i2->h1 i3->h1 i4->h1 i5->h1 i6->h1 i7->h1 i8->h1
## -0.14 1.60 -1.88 0.65 4.43 -2.46 -0.67 0.05 -3.61
## i9->h1 i10->h1 i11->h1 i12->h1
## -2.41 1.52 2.98 1.65
## b->h2 i1->h2 i2->h2 i3->h2 i4->h2 i5->h2 i6->h2 i7->h2 i8->h2
## 0.44 -0.83 1.07 0.71 0.73 0.00 0.66 0.38 -1.07
## i9->h2 i10->h2 i11->h2 i12->h2
## 0.09 0.00 0.09 -1.09
## b->h3 i1->h3 i2->h3 i3->h3 i4->h3 i5->h3 i6->h3 i7->h3 i8->h3
## 1.21 3.92 -0.71 1.73 -2.67 1.49 0.36 -0.03 3.92
## i9->h3 i10->h3 i11->h3 i12->h3
## 0.30 -3.39 1.96 -5.85
## b->h4 i1->h4 i2->h4 i3->h4 i4->h4 i5->h4 i6->h4 i7->h4 i8->h4
## -2.17 3.83 1.14 0.28 -1.49 -0.99 3.75 2.98 -4.88
## i9->h4 i10->h4 i11->h4 i12->h4
## -0.52 -1.34 0.72 -1.59
## b->h5 i1->h5 i2->h5 i3->h5 i4->h5 i5->h5 i6->h5 i7->h5 i8->h5
## -0.89 1.48 -1.48 -0.52 -1.33 0.44 -0.22 -0.45 0.73
## i9->h5 i10->h5 i11->h5 i12->h5
## -0.91 0.29 0.24 -0.83
## b->h6 i1->h6 i2->h6 i3->h6 i4->h6 i5->h6 i6->h6 i7->h6 i8->h6
## -0.65 -1.81 -0.12 0.84 0.43 -0.82 -2.19 -1.92 3.76
## i9->h6 i10->h6 i11->h6 i12->h6
## -0.03 -0.85 0.71 -0.97
## b->o h1->o h2->o h3->o h4->o h5->o h6->o
## 1.98 1.59 -2.75 1.39 -1.27 -1.82 -2.04
A graphical comparison between the actual and forecasted series over the training period is
winesalesPred <- forecast(winesalesNN,h=validLength)
plot(c(1980,1994),yrange,type="n",xlab="Year",ylab="Fortified Wine Sales (thousands of liters)",bty="l",xaxt="n",yaxt="n")
lines(winesalesTrain,bty="l")
lines(winesalesPred$fitted,col="blue",lty=2,lwd=2)
axis(1,at=seq(1980,1994,1),labels=format(seq(1980,1994,1)))
axis(2,at=seq(1000,6000,500),labels=format(seq(1000,6000,500)),las=2)
legend(1989,6000,c("Actuals","Neural Network"),lty=c(1,2),col=c("black","blue"),lwd=c(1,2),bty="n")
The residual plot over the training period is
plot(winesalesPred$residuals,bty="l",ylab="Residuals")
The neural network forecasts fit the training period very well.
The neural network forecasts for the validation period are
winesalesPred$mean
## Jan Feb Mar Apr May Jun Jul
## 1994 1319.548 1407.169 1855.154 2049.701 2132.175 2376.346 2758.660
## Aug Sep Oct Nov Dec
## 1994 2581.042 2139.324 1798.984 2306.188 2608.601
Some accuracy measures…
accuracy(winesalesPred,winesalesValid)
## ME RMSE MAE MPE MAPE
## Training set -0.09094804 75.71998 57.53445 -0.2129987 2.097739
## Test set 138.75901697 289.14325 245.23423 5.1737224 10.881000
## MASE ACF1 Theil's U
## Training set 0.2067678 -0.009408954 NA
## Test set 0.8813246 -0.040116767 0.6427012
The MAPE for the training set is pretty good, only about 2.1%. This measure increases to about 10.88% for the validation period, which might raise some questions about overfitting the training period.
Q.3 Compare your neural network to an exponential smoothing model used to forecast fortified wine sales.
winesalesETS <- ets(winesalesTrain,model="ZZZ",restrict=FALSE)
winesalesETS
## ETS(M,A,M)
##
## Call:
## ets(y = winesalesTrain, model = "ZZZ", restrict = FALSE)
##
## Smoothing parameters:
## alpha = 0.0555
## beta = 9e-04
## gamma = 1e-04
##
## Initial states:
## l = 4040.0811
## b = -6.7983
## s=1.1316 1.0399 0.8877 0.9505 1.2722 1.3862
## 1.1463 1.1097 0.9345 0.8513 0.6996 0.5903
##
## sigma: 0.0859
##
## AIC AICc BIC
## 2755.038 2759.118 2808.145
The model selected as the best fit is \(\texttt{ETS(M,A,M)}\) (multiplicative error and seasonality, additive trend).
The forecasts for 1994 using this exponential smoothing model are
winesalesForecast <- forecast(winesalesETS,h=validLength)
winesalesForecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1994 1289.829 1147.913 1431.745 1072.788 1506.871
## Feb 1994 1521.475 1353.802 1689.148 1265.041 1777.909
## Mar 1994 1842.645 1639.237 2046.054 1531.559 2153.732
## Apr 1994 2013.011 1790.409 2235.614 1672.571 2353.452
## May 1994 2379.117 2115.554 2642.679 1976.033 2782.201
## Jun 1994 2445.906 2174.435 2717.376 2030.728 2861.083
## Jul 1994 2943.532 2616.195 3270.870 2442.913 3444.151
## Aug 1994 2688.471 2388.895 2988.047 2230.309 3146.633
## Sep 1994 1998.782 1775.592 2221.971 1657.443 2340.120
## Oct 1994 1857.773 1649.880 2065.666 1539.829 2175.717
## Nov 1994 2165.635 1922.749 2408.521 1794.173 2537.097
## Dec 1994 2344.995 2081.384 2608.606 1941.836 2748.153
Some accuracy measures of the exponential smoothing model…
accuracy(winesalesForecast,winesalesValid)
## ME RMSE MAE MPE MAPE MASE
## Training set -25.32466 287.8687 224.6507 -1.317643 7.229271 0.8073515
## Test set 125.56906 328.9246 256.3940 4.443793 10.858860 0.9214307
## ACF1 Theil's U
## Training set 0.05168201 NA
## Test set -0.01105575 0.7140459
The neural network forecasts for the training set outperform the exponential smoothing forecasts. We see the neural network yielded a RMSE of 75.72 (compared to exponential smoothing RMSE of 287.86) and a MAPE of only 2.09% (compared to exponential smoothing MAPE of 7.23%). For the test set, however, we see the MAPE for the neural network and exponential smoothing to be nearly identical (10.88% and 10.85%, respectively). Let’s now compare the two methods graphically over the validation period.
plot(c(1994,1995),c(0,3500),type="n",xlab="The Year 1994",ylab="Fortified Wine Sales (thousands of liters)",bty="l",xaxt="n",yaxt="n")
lines(winesalesValid,bty="l")
lines(winesalesPred$mean,col="blue",lwd=2)
lines(winesalesForecast$mean,col="red",lty=2)
axis(1, at=seq(1994,1995,1/11), labels=c("Jan","Feb","Mar","Apr","May","Jun","Jul", "Aug","Sep","Oct","Nov","Dec"))
axis(2,at=seq(0,3500,500),labels=format(seq(0,3500,500)),las=2)
legend(1994.5,1500,c("Actuals","Neural Network","Exp Smoothing"),lty=c(1,1,2),col=c("black","blue","red"),lwd=c(1,2,1),bty="n")
Both methods achieved very similar MAPEs for the validation period, so we should not be surprised that their graphical comparisons are similar as well. There is some divergence between the two methods starting in May, where the neural network method forecasts lower than the exponential smoothing model. Around July the methods yield similar forecasts up to October, where the exponential smoothing method produces lower forecasts for the rest of the year. It is also observed that both methods underforecast for the majority of the time.