ausWines <- read.csv("AustralianWines.csv", stringsAsFactors = FALSE)
The goal is to obtain short-term forecasts for a series. It is not explanatory or descriptive. Because neural networks are data-driven, they require many training periods, which estimate weights and subsequently generate predictions. An advantage to using neural networks is they can be used for numerical and binary forecasting. Although neural network performance is better with high frequency series and the data in this problem is given by month, there is a large amount of data to use for trianing periods. Since the frequency is monthly I would consider using a simple neural network with no or one hidden layer. Another option is to use different methods and compare the performance to the neural network model or combine it with other methods to produce an ensemble approach.
• Partition the data using the period until December 1993 as the training period.
• Run a neural network using R’s nnetar with 11 non-seasonal lags (i.e., p = 11). Leave all other arguments at their default.
Start by creating a time plot of the original time series.
# Create a time series object out of Australian Fortified Wine Sales
ausWines.ts <- ts(ausWines$Fortified, start = c(1980,1), frequency = 12)
# Set up the time plot
par(oma = c(0,2,0,0))
yrange = range(ausWines.ts)
plot(c(1980,1995), yrange, type = "n", xlab = "Year", ylab = "Fortified Wine Sales \n (Thousands of Liters)", bty = "l", xaxt = "n", yaxt = "n", line = 2, cex = 0.8)
# Add the x and y axes
axis(1, at = seq(1980, 1995,1), labels = format(seq(1980, 1995, 1)), cex.axis = 0.7)
axis(2, at = seq(1000,6000,1000), labels = format(seq(1000,6000,1000)), cex.axis = 0.7, las = 0)
# Add the time series ausWines
lines(ausWines.ts, bty = "l")
Now partition the data into training period specified above.
# Set the validation and training period lengths
valid.Length <- 12
train.Length <- length(ausWines.ts) - valid.Length
# Partition the time series into training and validation periods
fortWine.train <- window(ausWines.ts, end = c(1980, train.Length))
fortWine.valid <- window(ausWines.ts, start = c(1980, train.Length + 1))
Now run a neural network using the nnetar function with 11 non-seasonal lags
# Create a reproducible output by setting the random number generator seed (using a large prime number)
set.seed(8373493)
# Call nnetar function
fortWine.nnetar <- nnetar(fortWine.train, p = 11)
# Take a look at the neural network autoregression
fortWine.nnetar
## Series: fortWine.train
## Model: NNAR(11,1,6)[12]
## Call: nnetar(y = fortWine.train, p = 11)
##
## Average of 20 networks, each of which is
## a 12-6-1 network with 85 weights
## options were - linear output units
##
## sigma^2 estimated as 5734
# Take a look at the weights for for the default number of neural networks (initialized pseudo-randomly)
summary(fortWine.nnetar$model[[1]])
## a 12-6-1 network with 85 weights
## options were - linear output units
## b->h1 i1->h1 i2->h1 i3->h1 i4->h1 i5->h1 i6->h1 i7->h1 i8->h1
## -0.14 1.60 -1.88 0.65 4.43 -2.46 -0.67 0.05 -3.61
## i9->h1 i10->h1 i11->h1 i12->h1
## -2.41 1.52 2.98 1.65
## b->h2 i1->h2 i2->h2 i3->h2 i4->h2 i5->h2 i6->h2 i7->h2 i8->h2
## 0.44 -0.83 1.07 0.71 0.73 0.00 0.66 0.38 -1.07
## i9->h2 i10->h2 i11->h2 i12->h2
## 0.09 0.00 0.09 -1.09
## b->h3 i1->h3 i2->h3 i3->h3 i4->h3 i5->h3 i6->h3 i7->h3 i8->h3
## 1.21 3.92 -0.71 1.73 -2.67 1.49 0.36 -0.03 3.92
## i9->h3 i10->h3 i11->h3 i12->h3
## 0.30 -3.39 1.96 -5.85
## b->h4 i1->h4 i2->h4 i3->h4 i4->h4 i5->h4 i6->h4 i7->h4 i8->h4
## -2.17 3.83 1.14 0.28 -1.49 -0.99 3.75 2.98 -4.88
## i9->h4 i10->h4 i11->h4 i12->h4
## -0.52 -1.34 0.72 -1.59
## b->h5 i1->h5 i2->h5 i3->h5 i4->h5 i5->h5 i6->h5 i7->h5 i8->h5
## -0.89 1.48 -1.48 -0.52 -1.33 0.44 -0.22 -0.45 0.73
## i9->h5 i10->h5 i11->h5 i12->h5
## -0.91 0.29 0.24 -0.83
## b->h6 i1->h6 i2->h6 i3->h6 i4->h6 i5->h6 i6->h6 i7->h6 i8->h6
## -0.65 -1.81 -0.12 0.84 0.43 -0.82 -2.19 -1.92 3.76
## i9->h6 i10->h6 i11->h6 i12->h6
## -0.03 -0.85 0.71 -0.97
## b->o h1->o h2->o h3->o h4->o h5->o h6->o
## 1.98 1.59 -2.75 1.39 -1.27 -1.82 -2.04
# Make predictions for the validation period
fortWine.nnetar.pred <- forecast(fortWine.nnetar, h = valid.Length)
# Generate accuracy metrics
accuracy(fortWine.nnetar.pred, valid.Length)
## ME RMSE MAE MPE
## Training set -9.094804e-02 75.71998 57.53445 -2.129987e-01
## Test set -1.307548e+03 1307.54799 1307.54799 -1.089623e+04
## MAPE MASE ACF1
## Training set 2.097739 0.09965001 -0.009408954
## Test set 10896.233282 2.26468072 NA
# Now create the time plot
plot(c(1980,1994), c(1000,6000), type = "n", xlab = "Year", ylab = "Fortified Wine Sales \n (Millions of Liters)", bty = "l", xaxt = "n", yaxt = "n",lty = 1, line = 2)
# Add the x and y axes
axis(1, at = seq(1980, 1994, 1), labels = format(seq(1980, 1994, 1)), cex.axis = 0.7)
axis(2, at = seq(1000,6000,500), labels = format(seq(1,6,0.5)), las = 2, cex.axis = 0.7)
# Add the series to the plot
lines(fortWine.train, bty = "l")
lines(fortWine.nnetar.pred$fitted, lwd = 2, lty = 2, col = "red")
# Add a legend to the plot
legend(1989,6, c("Fortified Wine", "Neural Network"), lty = c(1,2), col = c("black", "red"), lwd = c(1,2), bty = "n")
The MAPE of the training set appears to be ok, although it could be lower and the MAPE of the validation set is very high, indicating the model may be underfitting the data.
Create a time plot of the forecast errors:
plot(fortWine.nnetar.pred$residuals, bty = "l", ylab = "Residuals", main = "Fortified Wine NN Errors (Training Data)", col = "blue")
abline(0,0)
We see that there is no apparent U-shaped trend or trend of dips and peaks. You could argue that the over and under prediction changes around year 1989, but overall the NN model seems to fit the training period well.
# Compute NN forecasts in the validation period
fortWine.nnetar.pred$mean
## Jan Feb Mar Apr May Jun Jul
## 1994 1319.548 1407.169 1855.154 2049.701 2132.175 2376.346 2758.660
## Aug Sep Oct Nov Dec
## 1994 2581.042 2139.324 1798.984 2306.188 2608.601
# Check the accuracy metrics of the model
accuracy(fortWine.nnetar.pred, fortWine.valid)
## ME RMSE MAE MPE MAPE
## Training set -0.09094804 75.71998 57.53445 -0.2129987 2.097739
## Test set 138.75901697 289.14325 245.23423 5.1737224 10.881000
## MASE ACF1 Theil's U
## Training set 0.2067678 -0.009408954 NA
## Test set 0.8813246 -0.040116767 0.6427012
We now see that the MAPE of both the training and validation sets are lower. The MAPE seems pretty good too. If it was lower it might indicate overfitting.
# Apply the Holt-Winter's exponential smoothing method
ausWines.ets <- ets(fortWine.train, model = "ZZZ", restrict = FALSE)
# See what the result is
ausWines.ets
## ETS(M,A,M)
##
## Call:
## ets(y = fortWine.train, model = "ZZZ", restrict = FALSE)
##
## Smoothing parameters:
## alpha = 0.0555
## beta = 9e-04
## gamma = 1e-04
##
## Initial states:
## l = 4040.0811
## b = -6.7983
## s=1.1316 1.0399 0.8877 0.9505 1.2722 1.3862
## 1.1463 1.1097 0.9345 0.8513 0.6996 0.5903
##
## sigma: 0.0859
##
## AIC AICc BIC
## 2755.038 2759.118 2808.145
It found that the ETS(M,A,M) model fits best.
# Generate the forecasts
fortWine.forecast <- forecast(ausWines.ets, h = valid.Length)
fortWine.forecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1994 1289.829 1147.913 1431.745 1072.788 1506.871
## Feb 1994 1521.475 1353.802 1689.148 1265.041 1777.909
## Mar 1994 1842.645 1639.237 2046.054 1531.559 2153.732
## Apr 1994 2013.011 1790.409 2235.614 1672.571 2353.452
## May 1994 2379.117 2115.554 2642.679 1976.033 2782.201
## Jun 1994 2445.906 2174.435 2717.376 2030.728 2861.083
## Jul 1994 2943.532 2616.195 3270.870 2442.913 3444.151
## Aug 1994 2688.471 2388.895 2988.047 2230.309 3146.633
## Sep 1994 1998.782 1775.592 2221.971 1657.443 2340.120
## Oct 1994 1857.773 1649.880 2065.666 1539.829 2175.717
## Nov 1994 2165.635 1922.749 2408.521 1794.173 2537.097
## Dec 1994 2344.995 2081.384 2608.606 1941.836 2748.153
# Take a look at the accuracy metrics
accuracy(fortWine.forecast, fortWine.valid)
## ME RMSE MAE MPE MAPE MASE
## Training set -25.32466 287.8687 224.6507 -1.317643 7.229271 0.8073515
## Test set 125.56906 328.9246 256.3940 4.443793 10.858860 0.9214307
## ACF1 Theil's U
## Training set 0.05168201 NA
## Test set -0.01105575 0.7140459
The NN MAPE of the training set is lower, outperforming the exponential smoothing method, but the MAPE of the validation set is almost the same for both methods. The RMSE using NN is lower for the training set, but the Holt-Winter’s gives a lower RMSE for the validation set. The NN autoregression is significantly better in the training set and the Holt-Winter’s performs better on the validation set.
Let’s look at a plot to visualize the model fitting
plot(c(1994,1995),c(1000,3000),type="n",xlab="Year of 1994",ylab="Fortified Wine Sales (Thousands of Liters)",bty="l",xaxt="n",yaxt="n", cex.lab = 0.8)
lines(fortWine.valid,bty="l")
lines(fortWine.nnetar.pred$mean,col="blue",lwd=2)
lines(fortWine.forecast$mean,col="red",lty=2)
lines(0,3000)
axis(1, at=seq(1994,1995,1/11), labels=c("Jan","Feb","Mar","Apr","May","Jun","Jul", "Aug","Sep","Oct","Nov","Dec"), cex.axis = 0.7)
axis(2,at=seq(1000,3000,250),labels=format(seq(1000,3000,250)),las=2, cex.axis = 0.7)
legend(1994.25,1750,c("Fortifed Wine Sales","Neural Network","Holt-Winter's"),lty=c(1,1,2),col=c("black","blue","red"),lwd=c(1,2,1),bty="n", cex = 0.7)
Since the MAPE from both models are similar I expceted to see the two models move together. However, as stated previously the NN performs better on the training set than the validation set. The Holt-Winter’s model performs better for the validation set. The differences between the RMSE and MAPE for the two models is larger using the NN. This might mean the NN is overfitting the training set.