1. Would you consider neural networks for this task? Explain why.

#Imported
wine <- read.csv("AustralianWines.csv")

#Plotted to look at data
wineTS <- ts(wine$Fortified,start=c(1980, 1),frequency=12)
yrange = range(wineTS)
plot(c(1980, 1995),yrange,type="n",xlab="Year",ylab="Fortified wine sales (Liters, 1,000s)",bty="l",xaxt="n",yaxt="n")
lines(wineTS,bty="l")
axis(1,at=seq(1980, 1995, 1),labels=format(seq(1980, 1995, 1)))
axis(2,at=seq(1000, 6000, 1000),labels=format(seq(1000, 6000, 1000)),las=2)

They are acceptable here, since we’re looking for short-term forecasts that will be often repeated. Neural networks are good for these kinds of automated predictive tasks. Higher frequency data — such as hourly or daily — is better than monthly, but that’s still a high-enough frequency to make this work.

2. Use neural networks to forecast fortified wine sales as follows:

Partition the data using the period until December 1993 as the training period.

#Partitioned as directed: We have 180 months; training is 168 months
validLength <- 12
trainLength <- length(wineTS) - validLength
wineTrain <- window(wineTS, end=c(1980, trainLength))
wineValid <- window(wineTS, start=c(1980, trainLength+1))

Run a neural network using R’s nnetar with 11 non-seasonal lags (i.e. p = 11). Leave all other arguments at their default.

#Set seed, then run neural network
set.seed(8373493)
wineNN <- nnetar(wineTrain, p = 11)
wineNN
## Series: wineTrain 
## Model:  NNAR(11,1,6)[12] 
## Call:   nnetar(y = wineTrain, p = 11)
## 
## Average of 20 networks, each of which is
## a 12-6-1 network with 85 weights
## options were - linear output units 
## 
## sigma^2 estimated as 5734
summary(wineNN$model[[1]])
## a 12-6-1 network with 85 weights
## options were - linear output units 
##   b->h1  i1->h1  i2->h1  i3->h1  i4->h1  i5->h1  i6->h1  i7->h1  i8->h1 
##   -0.14    1.60   -1.88    0.65    4.43   -2.46   -0.67    0.05   -3.61 
##  i9->h1 i10->h1 i11->h1 i12->h1 
##   -2.41    1.52    2.98    1.65 
##   b->h2  i1->h2  i2->h2  i3->h2  i4->h2  i5->h2  i6->h2  i7->h2  i8->h2 
##    0.44   -0.83    1.07    0.71    0.73    0.00    0.66    0.38   -1.07 
##  i9->h2 i10->h2 i11->h2 i12->h2 
##    0.09    0.00    0.09   -1.09 
##   b->h3  i1->h3  i2->h3  i3->h3  i4->h3  i5->h3  i6->h3  i7->h3  i8->h3 
##    1.21    3.92   -0.71    1.73   -2.67    1.49    0.36   -0.03    3.92 
##  i9->h3 i10->h3 i11->h3 i12->h3 
##    0.30   -3.39    1.96   -5.85 
##   b->h4  i1->h4  i2->h4  i3->h4  i4->h4  i5->h4  i6->h4  i7->h4  i8->h4 
##   -2.17    3.83    1.14    0.28   -1.49   -0.99    3.75    2.98   -4.88 
##  i9->h4 i10->h4 i11->h4 i12->h4 
##   -0.52   -1.34    0.72   -1.59 
##   b->h5  i1->h5  i2->h5  i3->h5  i4->h5  i5->h5  i6->h5  i7->h5  i8->h5 
##   -0.89    1.48   -1.48   -0.52   -1.33    0.44   -0.22   -0.45    0.73 
##  i9->h5 i10->h5 i11->h5 i12->h5 
##   -0.91    0.29    0.24   -0.83 
##   b->h6  i1->h6  i2->h6  i3->h6  i4->h6  i5->h6  i6->h6  i7->h6  i8->h6 
##   -0.65   -1.81   -0.12    0.84    0.43   -0.82   -2.19   -1.92    3.76 
##  i9->h6 i10->h6 i11->h6 i12->h6 
##   -0.03   -0.85    0.71   -0.97 
##  b->o h1->o h2->o h3->o h4->o h5->o h6->o 
##  1.98  1.59 -2.75  1.39 -1.27 -1.82 -2.04

2a. Create a time plot for the actual and forecasted series over the training period. Create also a time plot of the forecast errors for the training period. Interpret what you see in the plots.

#Plot both, one in red
winePred <- forecast(wineNN,h=validLength)
plot(c(1980,1994),yrange,type="n",xlab="Year",ylab="Fortified wine sales (Liters, 1000s)",bty="l",xaxt="n",yaxt="n")
lines(wineTrain,bty="l")
lines(winePred$fitted,col="red",lty=2,lwd=2)
axis(1,at=seq(1980,1994,1),labels=format(seq(1980,1994,1)))
axis(2,at=seq(1000,6000,1000),labels=format(seq(1000,6000,1000)),las=2)
legend(1989,6000,c("Actual","Neural network"),lty=c(1,2),col=c("black","red"),lwd=c(1,2),bty="n")

#Basic residual plot
plot(winePred$residuals,bty="l",ylab="Residuals")

The neural network model fits the actual data very well, with residuals relatively well-distributed around 0.

2b. Use the neural network to forecast sales for each month in the validation period (January 1994 to December 1994).

winePred$mean
##           Jan      Feb      Mar      Apr      May      Jun      Jul
## 1994 1319.548 1407.169 1855.154 2049.701 2132.175 2376.346 2758.660
##           Aug      Sep      Oct      Nov      Dec
## 1994 2581.042 2139.324 1798.984 2306.188 2608.601

3. Compare your neural network to an exponential smoothing model used to forecast fortified wine sales.

3a. Use R’s ets function to automatically select and fit an exponential smoothing method until December 1993. Which model did ets fit?

wineETS <- ets(wineTrain, model="ZZZ", restrict=FALSE)
wineETS
## ETS(M,A,M) 
## 
## Call:
##  ets(y = wineTrain, model = "ZZZ", restrict = FALSE) 
## 
##   Smoothing parameters:
##     alpha = 0.0555 
##     beta  = 9e-04 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 4040.0811 
##     b = -6.7983 
##     s=1.1316 1.0399 0.8877 0.9505 1.2722 1.3862
##            1.1463 1.1097 0.9345 0.8513 0.6996 0.5903
## 
##   sigma:  0.0859
## 
##      AIC     AICc      BIC 
## 2755.038 2759.118 2808.145

It fits an MAM model, which is a multiplicative Holt-Winters’ method with multiplicative errors.

3b. Use this model to forecast sales for each month in 1994.

wineETSPred <- forecast(wineETS,h=validLength)
wineETSPred
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 1994       1289.829 1147.913 1431.745 1072.788 1506.871
## Feb 1994       1521.475 1353.802 1689.148 1265.041 1777.909
## Mar 1994       1842.645 1639.237 2046.054 1531.559 2153.732
## Apr 1994       2013.011 1790.409 2235.614 1672.571 2353.452
## May 1994       2379.117 2115.554 2642.679 1976.033 2782.201
## Jun 1994       2445.906 2174.435 2717.376 2030.728 2861.083
## Jul 1994       2943.532 2616.195 3270.870 2442.913 3444.151
## Aug 1994       2688.471 2388.895 2988.047 2230.309 3146.633
## Sep 1994       1998.782 1775.592 2221.971 1657.443 2340.120
## Oct 1994       1857.773 1649.880 2065.666 1539.829 2175.717
## Nov 1994       2165.635 1922.749 2408.521 1794.173 2537.097
## Dec 1994       2344.995 2081.384 2608.606 1941.836 2748.153

3c. How does the neural network compare to the exponential smoothing model in terms of predictive performance in the training period? The validation period?

#Call neural network figures
accuracy(winePred, wineValid)
##                        ME      RMSE       MAE        MPE      MAPE
## Training set  -0.09094804  75.71998  57.53445 -0.2129987  2.097739
## Test set     138.75901697 289.14325 245.23423  5.1737224 10.881000
##                   MASE         ACF1 Theil's U
## Training set 0.2067678 -0.009408954        NA
## Test set     0.8813246 -0.040116767 0.6427012
#Call ETS
accuracy(wineETSPred, wineValid)
##                     ME     RMSE      MAE       MPE      MAPE      MASE
## Training set -25.32466 287.8687 224.6507 -1.317643  7.229271 0.8073515
## Test set     125.56906 328.9246 256.3940  4.443793 10.858860 0.9214307
##                     ACF1 Theil's U
## Training set  0.05168201        NA
## Test set     -0.01105575 0.7140459

The neural network outperforms the smoothing method significantly on the training set, with the former having strong RMSE and MAPE values of 75.71 and 2.09 to the latter’s 287.9 and 7.22, respectively. On the test set, the neural network performs only slightly better than the smoothing method.

#Plot actual, neural network and exponential smoothing
plot(c(1994,1995),c(1000,3000),type="n",xlab="1994",ylab="Fortified wine sales (Liters, 1,000s)",bty="l",xaxt="n",yaxt="n")
lines(wineValid,bty="l")
lines(winePred$mean,col="blue",lwd=2)
lines(wineETSPred$mean,col="red",lty=2)
axis(1, at=seq(1994,1995,1/11), labels=c("Jan","Feb","Mar","Apr","May","Jun","Jul", "Aug","Sep","Oct","Nov","Dec"))
axis(2,at=seq(1000,3000,500),labels=format(seq(1000,3000,500)),las=2)
legend(1994.4,1700,c("Actual","Neural network","Exponential smoothing"),lty=c(1,1,2),col=c("black","blue","red"),lwd=c(1,2,1),bty="n")

Here, we plot only the validation period and expect to see similar results, since the MAPE values are so similar for both. Indeed, they mostly underforecast the actual numbers and generally mirror each other, though the neural network tends to stick to the actual data better.