library(forecast)
library(knitr)
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
setwd("~/Documents/MBA 678/Unit 7")
Artificial Neural Networks could be a worthwhile forecasting approach in this scenario for a few reasons:
wine <-read.csv("wine.csv", header = TRUE, stringsAsFactors = FALSE)
wineTS <-ts(wine$Fortified, start=c(1980), frequency=12)
#raw plot to check
#plot(wineTS)
validLength <- 12
# Set the length of the training period to everything else
trainLength <- length(wineTS) - validLength
# Partition the data into training and validation periods
wineTrain <- window(wineTS, end=c(1980, trainLength))
wineValid <- window(wineTS, start=c(1980,trainLength+1), end=c(1980,trainLength+validLength))
set.seed(8373493)
wineNN <- nnetar(wineTrain, p=11)
wineNN
## Series: wineTrain
## Model: NNAR(11,1,6)[12]
## Call: nnetar(y = wineTrain, p = 11)
##
## Average of 20 networks, each of which is
## a 12-6-1 network with 85 weights
## options were - linear output units
##
## sigma^2 estimated as 5734
summary(wineNN$model[[1]])
## a 12-6-1 network with 85 weights
## options were - linear output units
## b->h1 i1->h1 i2->h1 i3->h1 i4->h1 i5->h1 i6->h1 i7->h1 i8->h1
## -0.14 1.60 -1.88 0.65 4.43 -2.46 -0.67 0.05 -3.61
## i9->h1 i10->h1 i11->h1 i12->h1
## -2.41 1.52 2.98 1.65
## b->h2 i1->h2 i2->h2 i3->h2 i4->h2 i5->h2 i6->h2 i7->h2 i8->h2
## 0.44 -0.83 1.07 0.71 0.73 0.00 0.66 0.38 -1.07
## i9->h2 i10->h2 i11->h2 i12->h2
## 0.09 0.00 0.09 -1.09
## b->h3 i1->h3 i2->h3 i3->h3 i4->h3 i5->h3 i6->h3 i7->h3 i8->h3
## 1.21 3.92 -0.71 1.73 -2.67 1.49 0.36 -0.03 3.92
## i9->h3 i10->h3 i11->h3 i12->h3
## 0.30 -3.39 1.96 -5.85
## b->h4 i1->h4 i2->h4 i3->h4 i4->h4 i5->h4 i6->h4 i7->h4 i8->h4
## -2.17 3.83 1.14 0.28 -1.49 -0.99 3.75 2.98 -4.88
## i9->h4 i10->h4 i11->h4 i12->h4
## -0.52 -1.34 0.72 -1.59
## b->h5 i1->h5 i2->h5 i3->h5 i4->h5 i5->h5 i6->h5 i7->h5 i8->h5
## -0.89 1.48 -1.48 -0.52 -1.33 0.44 -0.22 -0.45 0.73
## i9->h5 i10->h5 i11->h5 i12->h5
## -0.91 0.29 0.24 -0.83
## b->h6 i1->h6 i2->h6 i3->h6 i4->h6 i5->h6 i6->h6 i7->h6 i8->h6
## -0.65 -1.81 -0.12 0.84 0.43 -0.82 -2.19 -1.92 3.76
## i9->h6 i10->h6 i11->h6 i12->h6
## -0.03 -0.85 0.71 -0.97
## b->o h1->o h2->o h3->o h4->o h5->o h6->o
## 1.98 1.59 -2.75 1.39 -1.27 -1.82 -2.04
Let’s plot what we’ve created so far:
#par(mfrow = c(1,2))
plot(c(1980, 1994), c(1000,6000), type="n", xlab="Year", ylab="Aussie Fortified Wine Sales", bty="l", xaxt="n", yaxt="n")
# Add the x-axis
axis(1, at=seq(1980,1994,1), labels=format(seq(1980,1994,1)))
# Add the y-axis
axis(2, at=seq(1000, 6000, 1000), labels=format(seq(1000, 6000, 1000)), las=0)
#Add the time series deptTS
lines(wineTS, bty="l")
lines(wineNN$fitted, col="blue", lwd=2, lty=2)
legend(8,6000, c("Actuals", "Neural Network"), lty=c(1,2), col=c("black", "blue"), lwd=c(1,2,1), bty="n")
plot(wineNN$residuals)
accuracy(wineNN)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.09094804 75.71998 57.53445 -0.2129987 2.097739 0.2067678
## ACF1
## Training set -0.009408954
Interpreting the fit of our NN model:
The plot of the actual time series overlayed with the fitted NN model suggests a pretty good fit. There are no noticeable deviations where the model is consistently different from the base series. Likewise the residuals plot shows the residuals centered around zero, with no obvious seasonality or trend pattern. The magnitude of the residuals increases slightly toward the end of the training period, suggesting changes in the series may be more erratic as time progressed.
By calling accuracy, we find a MAPE of 2.1%, also suggesting a pretty good fit to the training period. When we produce a forecast for the validation period, we’ll want to watch for overfitting of the training period.
#Let's create forecasts based on our NN model for the validation period:
winePred <- forecast(wineNN, h=validLength)
#Let's output the point forecasts:
winePred$mean
## Jan Feb Mar Apr May Jun Jul
## 1994 1319.548 1407.169 1855.154 2049.701 2132.175 2376.346 2758.660
## Aug Sep Oct Nov Dec
## 1994 2581.042 2139.324 1798.984 2306.188 2608.601
#Let's take a look at the accuracy:
accuracy(winePred,wineValid)
## ME RMSE MAE MPE MAPE
## Training set -0.09094804 75.71998 57.53445 -0.2129987 2.097739
## Test set 138.75901697 289.14325 245.23423 5.1737224 10.881000
## MASE ACF1 Theil's U
## Training set 0.2067678 -0.009408954 NA
## Test set 0.8813246 -0.040116767 0.6427012
Let’s format the point forecasts for the validation period into a simple table:
| Month (1994) | Forecast |
|---|---|
| Jan | 1319.548 |
| Feb | 1407.169 |
| Mar | 1855.154 |
| Apr | 2049.701 |
| May | 2132.175 |
| Jun | 2376.346 |
| Jul | 2758.660 |
| Aug | 2581.042 |
| Sep | 2139.324 |
| Oct | 1798.984 |
| Nov | 2306.188 |
| Dec | 2608.601 |
Next, let’s produce a Holt-Winters forecast for the same series and compare:
#Create hw model and forecast
wine.hw <- ets(wineTrain, model = "ZZZ", restrict = FALSE, allow.multiplicative.trend = TRUE)
wine.hwfcst <- forecast(wine.hw, h=validLength, level=0)
wine.hwfcst
## Point Forecast Lo 0 Hi 0
## Jan 1994 1302.430 1301.291 1301.291
## Feb 1994 1569.495 1568.186 1568.186
## Mar 1994 1890.381 1887.986 1887.986
## Apr 1994 2075.542 2072.648 2072.648
## May 1994 2448.559 2448.784 2448.784
## Jun 1994 2524.946 2526.919 2526.919
## Jul 1994 3049.157 3043.520 3043.520
## Aug 1994 2756.247 2753.629 2753.629
## Sep 1994 2054.354 2054.322 2054.322
## Oct 1994 1919.635 1915.391 1915.391
## Nov 1994 2256.245 2251.508 2251.508
## Dec 1994 2406.063 2403.926 2403.926
plot(wine.hwfcst)
From the code executed above, we see that the ets function fitted an exponential smoothing model type MMM. This means the function fitted a model to the training set with multiplicative error, trend, and seasonality.
Using our ets / Holt-Winters model, we fit the point forecasts below:
| Month (1994) | Forecast |
|---|---|
| Jan | 1302.430 |
| Feb | 1569.495 |
| Mar | 1890.381 |
| Apr | 2075.542 |
| May | 2448.559 |
| Jun | 2524.946 |
| Jul | 3049.157 |
| Aug | 2756.247 |
| Sep | 2054.354 |
| Oct | 1919.635 |
| Nov | 2256.245 |
| Dec | 2406.063 |
#Let's compare accuracy over both methods, with the NN model first:
accuracy(winePred,wineValid)
## ME RMSE MAE MPE MAPE
## Training set -0.09094804 75.71998 57.53445 -0.2129987 2.097739
## Test set 138.75901697 289.14325 245.23423 5.1737224 10.881000
## MASE ACF1 Theil's U
## Training set 0.2067678 -0.009408954 NA
## Test set 0.8813246 -0.040116767 0.6427012
#And now the HW model:
accuracy(wine.hwfcst,wineValid)
## ME RMSE MAE MPE MAPE MASE
## Training set -4.06882 281.6645 217.3976 -0.7814925 6.989709 0.7812852
## Test set 62.07894 309.0087 235.7270 1.7052608 9.950926 0.8471576
## ACF1 Theil's U
## Training set 0.04951044 NA
## Test set 0.01101617 0.6501182
Looking at accuracy between the two models, we observe some interesting differences.
The NN approach had lower (2.1% vs 7.0%) MAPE than the HW approach in the training set, which indicates that the NN achieved a closer fit of the training series than the Holt-Winters model.
The HW model achieved lower MAPE (9.6% vs NN’s 10.9%) on the test set, indicating that a very slightly better predictive performance from the HW model than the NN, despite the NN’s “better” fit on the training period.
We observe a much greater difference between the NN models’ MAPE in training versus validation period than we do for the same difference in the HW model. This suggests at least a concern that the NN approach is over-fitting the training set, at least a little bit, in this example.
For further analysis, it would be interesting to average the two models’ output and create an ensemble forecast, to test whether an ensemble approach would yield overall better predictive results.