library(forecast)
library(knitr)
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
setwd("~/Documents/MBA 678/Unit 7") 

Question 1

Artificial Neural Networks could be a worthwhile forecasting approach in this scenario for a few reasons:

  1. We have lots of data to feed to the algorithm.
  2. We have several forecasts to produce.
  3. Where this task is expected to be repeated, a higher level of automation is desirable.
  4. The task as described is almost purely predictive; no emphasis is placed on descriptive needs from the analysis. The “black box” nature of neural network forecasting may be acceptable in this context.

Question 2

wine <-read.csv("wine.csv", header = TRUE, stringsAsFactors = FALSE)
wineTS <-ts(wine$Fortified, start=c(1980), frequency=12)
#raw plot to check
#plot(wineTS)
validLength <- 12
# Set the length of the training period to everything else
trainLength <- length(wineTS) - validLength
# Partition the data into training and validation periods
wineTrain <- window(wineTS, end=c(1980, trainLength))
wineValid <- window(wineTS, start=c(1980,trainLength+1), end=c(1980,trainLength+validLength))
set.seed(8373493)
wineNN <- nnetar(wineTrain, p=11)
wineNN
## Series: wineTrain 
## Model:  NNAR(11,1,6)[12] 
## Call:   nnetar(y = wineTrain, p = 11)
## 
## Average of 20 networks, each of which is
## a 12-6-1 network with 85 weights
## options were - linear output units 
## 
## sigma^2 estimated as 5734
summary(wineNN$model[[1]])
## a 12-6-1 network with 85 weights
## options were - linear output units 
##   b->h1  i1->h1  i2->h1  i3->h1  i4->h1  i5->h1  i6->h1  i7->h1  i8->h1 
##   -0.14    1.60   -1.88    0.65    4.43   -2.46   -0.67    0.05   -3.61 
##  i9->h1 i10->h1 i11->h1 i12->h1 
##   -2.41    1.52    2.98    1.65 
##   b->h2  i1->h2  i2->h2  i3->h2  i4->h2  i5->h2  i6->h2  i7->h2  i8->h2 
##    0.44   -0.83    1.07    0.71    0.73    0.00    0.66    0.38   -1.07 
##  i9->h2 i10->h2 i11->h2 i12->h2 
##    0.09    0.00    0.09   -1.09 
##   b->h3  i1->h3  i2->h3  i3->h3  i4->h3  i5->h3  i6->h3  i7->h3  i8->h3 
##    1.21    3.92   -0.71    1.73   -2.67    1.49    0.36   -0.03    3.92 
##  i9->h3 i10->h3 i11->h3 i12->h3 
##    0.30   -3.39    1.96   -5.85 
##   b->h4  i1->h4  i2->h4  i3->h4  i4->h4  i5->h4  i6->h4  i7->h4  i8->h4 
##   -2.17    3.83    1.14    0.28   -1.49   -0.99    3.75    2.98   -4.88 
##  i9->h4 i10->h4 i11->h4 i12->h4 
##   -0.52   -1.34    0.72   -1.59 
##   b->h5  i1->h5  i2->h5  i3->h5  i4->h5  i5->h5  i6->h5  i7->h5  i8->h5 
##   -0.89    1.48   -1.48   -0.52   -1.33    0.44   -0.22   -0.45    0.73 
##  i9->h5 i10->h5 i11->h5 i12->h5 
##   -0.91    0.29    0.24   -0.83 
##   b->h6  i1->h6  i2->h6  i3->h6  i4->h6  i5->h6  i6->h6  i7->h6  i8->h6 
##   -0.65   -1.81   -0.12    0.84    0.43   -0.82   -2.19   -1.92    3.76 
##  i9->h6 i10->h6 i11->h6 i12->h6 
##   -0.03   -0.85    0.71   -0.97 
##  b->o h1->o h2->o h3->o h4->o h5->o h6->o 
##  1.98  1.59 -2.75  1.39 -1.27 -1.82 -2.04

2a

Let’s plot what we’ve created so far:

#par(mfrow = c(1,2))
plot(c(1980, 1994), c(1000,6000), type="n", xlab="Year",  ylab="Aussie Fortified Wine Sales", bty="l", xaxt="n", yaxt="n")
# Add the x-axis
axis(1, at=seq(1980,1994,1), labels=format(seq(1980,1994,1)))
# Add the y-axis
axis(2, at=seq(1000, 6000, 1000), labels=format(seq(1000, 6000, 1000)), las=0)
#Add the time series deptTS
lines(wineTS, bty="l")
lines(wineNN$fitted, col="blue", lwd=2, lty=2)
legend(8,6000, c("Actuals", "Neural Network"), lty=c(1,2), col=c("black", "blue"), lwd=c(1,2,1), bty="n")

plot(wineNN$residuals)

accuracy(wineNN)
##                       ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -0.09094804 75.71998 57.53445 -0.2129987 2.097739 0.2067678
##                      ACF1
## Training set -0.009408954

Interpreting the fit of our NN model:

The plot of the actual time series overlayed with the fitted NN model suggests a pretty good fit. There are no noticeable deviations where the model is consistently different from the base series. Likewise the residuals plot shows the residuals centered around zero, with no obvious seasonality or trend pattern. The magnitude of the residuals increases slightly toward the end of the training period, suggesting changes in the series may be more erratic as time progressed.

By calling accuracy, we find a MAPE of 2.1%, also suggesting a pretty good fit to the training period. When we produce a forecast for the validation period, we’ll want to watch for overfitting of the training period.

2b

#Let's create forecasts based on our NN model for the validation period:
winePred <- forecast(wineNN, h=validLength)
#Let's output the point forecasts:
winePred$mean
##           Jan      Feb      Mar      Apr      May      Jun      Jul
## 1994 1319.548 1407.169 1855.154 2049.701 2132.175 2376.346 2758.660
##           Aug      Sep      Oct      Nov      Dec
## 1994 2581.042 2139.324 1798.984 2306.188 2608.601
#Let's take a look at the accuracy:
accuracy(winePred,wineValid) 
##                        ME      RMSE       MAE        MPE      MAPE
## Training set  -0.09094804  75.71998  57.53445 -0.2129987  2.097739
## Test set     138.75901697 289.14325 245.23423  5.1737224 10.881000
##                   MASE         ACF1 Theil's U
## Training set 0.2067678 -0.009408954        NA
## Test set     0.8813246 -0.040116767 0.6427012

Let’s format the point forecasts for the validation period into a simple table:

Month (1994) Forecast
Jan 1319.548
Feb 1407.169
Mar 1855.154
Apr 2049.701
May 2132.175
Jun 2376.346
Jul 2758.660
Aug 2581.042
Sep 2139.324
Oct 1798.984
Nov 2306.188
Dec 2608.601

Question 3

Next, let’s produce a Holt-Winters forecast for the same series and compare:

#Create hw model and forecast
wine.hw <- ets(wineTrain, model = "ZZZ", restrict = FALSE, allow.multiplicative.trend = TRUE)
wine.hwfcst <- forecast(wine.hw, h=validLength, level=0)
wine.hwfcst
##          Point Forecast     Lo 0     Hi 0
## Jan 1994       1302.430 1301.291 1301.291
## Feb 1994       1569.495 1568.186 1568.186
## Mar 1994       1890.381 1887.986 1887.986
## Apr 1994       2075.542 2072.648 2072.648
## May 1994       2448.559 2448.784 2448.784
## Jun 1994       2524.946 2526.919 2526.919
## Jul 1994       3049.157 3043.520 3043.520
## Aug 1994       2756.247 2753.629 2753.629
## Sep 1994       2054.354 2054.322 2054.322
## Oct 1994       1919.635 1915.391 1915.391
## Nov 1994       2256.245 2251.508 2251.508
## Dec 1994       2406.063 2403.926 2403.926
plot(wine.hwfcst)

3a

From the code executed above, we see that the ets function fitted an exponential smoothing model type MMM. This means the function fitted a model to the training set with multiplicative error, trend, and seasonality.

3b

Using our ets / Holt-Winters model, we fit the point forecasts below:

Month (1994) Forecast
Jan 1302.430
Feb 1569.495
Mar 1890.381
Apr 2075.542
May 2448.559
Jun 2524.946
Jul 3049.157
Aug 2756.247
Sep 2054.354
Oct 1919.635
Nov 2256.245
Dec 2406.063

3c

#Let's compare accuracy over both methods, with the NN model first:
accuracy(winePred,wineValid)
##                        ME      RMSE       MAE        MPE      MAPE
## Training set  -0.09094804  75.71998  57.53445 -0.2129987  2.097739
## Test set     138.75901697 289.14325 245.23423  5.1737224 10.881000
##                   MASE         ACF1 Theil's U
## Training set 0.2067678 -0.009408954        NA
## Test set     0.8813246 -0.040116767 0.6427012
#And now the HW model:
accuracy(wine.hwfcst,wineValid)
##                    ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -4.06882 281.6645 217.3976 -0.7814925 6.989709 0.7812852
## Test set     62.07894 309.0087 235.7270  1.7052608 9.950926 0.8471576
##                    ACF1 Theil's U
## Training set 0.04951044        NA
## Test set     0.01101617 0.6501182

Looking at accuracy between the two models, we observe some interesting differences.

  1. The NN approach had lower (2.1% vs 7.0%) MAPE than the HW approach in the training set, which indicates that the NN achieved a closer fit of the training series than the Holt-Winters model.

  2. The HW model achieved lower MAPE (9.6% vs NN’s 10.9%) on the test set, indicating that a very slightly better predictive performance from the HW model than the NN, despite the NN’s “better” fit on the training period.

  3. We observe a much greater difference between the NN models’ MAPE in training versus validation period than we do for the same difference in the HW model. This suggests at least a concern that the NN approach is over-fitting the training set, at least a little bit, in this example.

  4. For further analysis, it would be interesting to average the two models’ output and create an ensemble forecast, to test whether an ensemble approach would yield overall better predictive results.