Use time series forecasting models in R to analyze Australia beer production data.

Summary

Load packages

library(fpp)
library(astsa)
library(DT)
library(dygraphs)

Load Australia beer production dataset

beer <- read.csv("monthly-beer-production-in-austr.csv")
head(beer)
##     Month Monthly.beer.production.in.Australia
## 1 1956-01                                 93.2
## 2 1956-02                                 96.0
## 3 1956-03                                 95.2
## 4 1956-04                                 77.1
## 5 1956-05                                 70.9
## 6 1956-06                                 64.8
tail(beer)
##       Month Monthly.beer.production.in.Australia
## 471 1995-03                                  152
## 472 1995-04                                  127
## 473 1995-05                                  151
## 474 1995-06                                  130
## 475 1995-07                                  119
## 476 1995-08                                  153

Convert data frame into time serise object

beer.ts <- ts(beer, frequency = 12, start = c(1956,1), end = c(1994,12))

Time Series Plots

The first thing to do with any time seriese analysis is to plot the charts.

It is also a good idea to aggregate monthly production volume into quarterly and yearly volume. Depending on the business questions we try to answer, different time scales can be very useful.

beer.ts.qtr <- aggregate(beer.ts, nfrequency=4)
beer.ts.yr <- aggregate(beer.ts, nfrequency=1)
plot.ts(beer.ts[,2], main = "Monthly Beer Production in Australia", xlab = "Year", ylab = "ML")

plot.ts(beer.ts.qtr[,2], main = "Quarterly Beer Production in Australia", xlab = "Year", ylab = "ML")

plot.ts(beer.ts.yr[,2], main = "Yearly Beer Production in Australia", xlab = "Year", ylab = "ML")

As we can see there was a strong growth from 1950 to 1975 then the production was slowing declining with higher volatility. We can also see very strong seasonality which is obvious for the product like beer.

Next step we want to take a look at seasonality in more detail.

seasonplot(beer.ts[,2], year.labels = TRUE, year.labels.left=TRUE, col=1:40, pch=19, main = "Monthly Beer Production in Australia - seasonplot", xlab = "Month", ylab = "ML")

monthplot(beer.ts[,2], main = "Monthly Beer Production in Australia - monthplot", xlab = "Month", ylab = "ML")

boxplot(beer.ts[,2] ~ cycle(beer.ts[,2]), xlab = "Month", ylab = "ML", main = "Monthly Beer Production in Australia - Boxplot")

3 different charts all give us the level among different months, but also the range and variation. We can also see the variation and range within the same month.

Next, we will look at the trend more closely. Moving average smoothing is a great way to exame the trend. By adjusting how many months to calculate the average, we can control the smoothness of the trend line

par(mfrow = c(2,2))
plot(beer.ts[,2], col="gray", main = "1 Year Moving Average Smoothing")
lines(ma(beer.ts[,2], order = 12), col = "red", lwd=3)
plot(beer.ts[,2], col="gray", main = "3 Year Moving Average Smoothing")
lines(ma(beer.ts[,2], order = 36), col = "blue", lwd=3)
plot(beer.ts[,2], col="gray", main = "5 Year Moving Average Smoothing")
lines(ma(beer.ts[,2], order = 60), col = "green", lwd=3)
plot(beer.ts[,2], col="gray", main = "10 Year Moving Average Smoothing")
lines(ma(beer.ts[,2], order = 120), col = "yellow4", lwd=3)

Transformation and Adjustment

Sometimes, adjusting and transformating data can make the historical data less complex so a simpler forecast model can be used. It is also a good idea to remove the underline factors affected the time series like workday, inflation, population, and currancy exchange rate…ect.

Calendar Adjustment

Sometimes, the variances in monthly data is simple due to the different number of days in each month. If we look at the monthly beer production data, not only the number of days in each month will impact the production valume, but number of working days will also have a significant impact as well.

plot.ts(beer.ts[,2], main = "Monthly Beer Production in Australia", xlab = "Year", ylab = "ML")

plot.ts(beer.ts[,2]/monthdays(beer.ts[,2]), main = "Monthly Beer Production in Australia - Adjusted By Calendar Days", xlab = "Year", ylab = "ML")

Population Adjustment

Time series data that is heavily impact by population growth is good to have the affect removed. Without doing it, it might mislead the true underlying factors you are interested in. In this case, if we really want to get an idea on how the beer industry was really doing in Australia, we might want remove the production growth that is simply due to the growth of the population in the country especailly when dataset goes way back to 1950.

pop <- read.csv("AUS_POP.csv")
plot.ts(beer.ts.yr[,2], main = "Yearly Beer Production in Australia", xlab = "Year", ylab = "ML")

plot(pop[7:45,], main = "Australia Popuation", xlab = "Year", ylab = "Number of People in 1,000", type = "l")

plot.ts(beer.ts.yr[,2]/pop[7:45,2], main = "Yearly Beer Production in Australia Adjusted by Population", xlab = "Year", ylab = "ML")

We can see that it gave us a really differet picture after removing the affect of population growth. From 1950 to 1975, we still see a strong growth. However, after 1980, a slow decline was actaully a very steep decline after the population factor was removed.

Logarithm transformation

Logarithm transformation is most used mathematical transformation not only in time series but any other regression models or data visualization. In time series analysis, by applying log transformation, it will stablize the variance. Hence transform exponential trend line into a linear line. It is used before differencing data to improve stationarity.

plot.ts(beer.ts[,2], main = "Monthly Beer Production in Australia", xlab = "Year", ylab = "ML")

plot.ts(log(beer.ts[,2]), main = "Log Transformated Monthly Beer Production in Australia", xlab = "Year", ylab = "ML")

Decomposition

After just looking at the different plots, we normally can get a good sense on how the time series behaves and the different components within the data. Decomposition is a tool that we can seperate different components in a time series data so we can see trend, seasonility, and random noises individually.

STL Decomposition

plot(stl(beer.ts[,2], s.window="periodic"))

From this chart, we can see that the seasonaliy is strong but consistant. The trend is similar to what we saw when we aggreciated the data into yearly which is that from 1950 to 1975, there was a strong growth and after that, the production slowly went down. Also, the noises went up started from 1975 as well.

Basic Models Forecast

Simple Average - simple average of all data points

Naive Method - the last observation value

Seasonal Navie - the last observation value from previous seasonal cycle

Drift Method - forecast value increase or decrease over time based on average change in historical data

beer.fit.a <- meanf(beer.ts[,2], h = 120)
beer.fit.n <- naive(beer.ts[,2], h = 120)
beer.fit.sn <- snaive(beer.ts[,2], h = 120)
beer.fit.dri <- rwf(beer.ts[,2], h = 120, drift = TRUE)
plot.ts(beer.ts[,2], main = "Monthly Beer Production in Australia", xlab = "Year", ylab = "ML", xlim = c(1954, 2004))
lines(beer.fit.a$mean, col = "blue")
lines(beer.fit.n$mean, col = "yellow4")
lines(beer.fit.dri$mean, col = "seagreen4")
lines(beer.fit.sn$mean, col = "red")
legend("topleft",lty=1,col=c("blue","yellow4","seagreen4", "red"), cex = 0.75,
       legend=c("Mean method","Naive method","Drift Naive method", "Seasonal naive method"))

In this case, all of them are not good models because of the combination of trend and seasonality. However, if you have a less complex time series, these models might be good enough. Plus, these models can also be treated as benchmark when we move to more complex models.

Regression Analysis

Forecast with Linear Regression

beer.fit.lm <- tslm(beer.ts[,2] ~ trend)
f <- forecast(beer.fit.lm, h = 120, level = c(80,95))
plot.ts(beer.ts[,2], main = "Monthly Beer Production in Australia", xlab = "Year", ylab = "ML", xlim = c(1954,2004))
lines(f$fitted, col = "blue")
lines(f$mean, col = "blue")

plot(beer.fit.lm$fitted, beer.ts[,2], main = "Scatterplot between fitted & actual values", xlab = "Fitted Value", ylab = "Actual")
abline(0, 1, col="blue")

summary(beer.fit.lm)
## 
## Call:
## tslm(formula = beer.ts[, 2] ~ trend)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -52.50 -16.92  -3.28  13.77  73.93 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 93.957034   2.188016   42.94   <2e-16 ***
## trend        0.180839   0.008085   22.37   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23.63 on 466 degrees of freedom
## Multiple R-squared:  0.5178, Adjusted R-squared:  0.5167 
## F-statistic: 500.3 on 1 and 466 DF,  p-value: < 2.2e-16

It did a good job on picking up the trend. However, we know that there is seasonality as well. So we will add another variable into the regression

Forecast with Linear Regression and Seasonality

beer.fit.lm2 <- tslm(beer.ts[,2] ~ trend + season)
summary(beer.fit.lm2)
## 
## Call:
## tslm(formula = beer.ts[, 2] ~ trend + season)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -43.472 -12.449  -2.105  12.632  51.070 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  98.538349   3.119771  31.585  < 2e-16 ***
## trend         0.179376   0.005989  29.949  < 2e-16 ***
## season2      -9.066555   3.962776  -2.288 0.022599 *  
## season3       2.120736   3.962790   0.535 0.592799    
## season4     -11.579153   3.962812  -2.922 0.003652 ** 
## season5     -15.835452   3.962844  -3.996 7.51e-05 ***
## season6     -29.196879   3.962885  -7.368 8.24e-13 ***
## season7     -19.786511   3.962934  -4.993 8.49e-07 ***
## season8     -13.822297   3.962993  -3.488 0.000534 ***
## season9     -10.573467   3.963061  -2.668 0.007903 ** 
## season10      9.011260   3.963138   2.274 0.023445 *  
## season11     18.037012   3.963224   4.551 6.86e-06 ***
## season12     29.831995   3.963319   7.527 2.81e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.5 on 455 degrees of freedom
## Multiple R-squared:  0.7418, Adjusted R-squared:  0.7349 
## F-statistic: 108.9 on 12 and 455 DF,  p-value: < 2.2e-16
plot.ts(beer.ts[,2], main = "Monthly Beer Production in Australia", xlab = "Year", ylab = "ML")
lines(beer.fit.lm2$fitted.values, col = "blue")

plot(beer.fit.lm2$fitted, beer.ts[,2], main = "Scatterplot between fitted & actual values", xlab = "Fitted Value", ylab = "Actual")
abline(0, 1, col="blue")

f <- forecast(beer.fit.lm2, h = 120, level = c(80,95))
plot.ts(beer.ts[,2], main = "Monthly Beer Production in Australia", xlab = "Year", ylab = "ML", xlim = c(1954,2004))
lines(f$mean, col = "blue")

Other Regression Models - Parabola

We can see the trend clearly not a straight line. Therefore, we can add a new input variable t^2 to create a parabola.

t <- seq(1956, 1995.2, length = length(beer.ts[,2]))
t2 <- t^2
beer.fit.lm3 <- tslm(beer.ts[,2] ~ t + t2)
plot.ts(beer.ts[,2], main = "Monthly Beer Production in Australia", xlab = "Year", ylab = "ML")
lines(beer.fit.lm3$fit, col = "blue")

sin.t <- sin(2*pi*t)
cos.t <- cos(2*pi*t)
beer.fit.lm4 <- tslm(beer.ts[,2] ~ t + t2 + sin.t + cos.t)
plot.ts(beer.ts[,2], main = "Monthly Beer Production in Australia", xlab = "Year", ylab = "ML")
lines(beer.fit.lm4$fit, col = "blue")

Exponential Smoothing

Exponential smoothing is basically taking weighted average of previoud observations. By adjusting alpha, beta, and gamma parameters, we can control the weights for level, trend, and seasonality.

Simple Exponential Smoothing simple exponential smoothing is used for the time seriese without trend and seasonality. So I only picked the time series from 1993 to 1994 so the analysis can make sense.

beer.ts2 <- window(beer.ts, start = 1993)
plot.ts(beer.ts2[,2], main = "Monthly Beer Production in Australia", xlab = "Year", ylab = "ML")

beer.fit.ses1 <- ses(beer.ts2[,2], alpha = 0.2, initial = "simple", h = 12)
beer.fit.ses2 <- ses(beer.ts2[,2], alpha = 0.6, initial = "simple", h = 12)
beer.fit.ses3 <- ses(beer.ts2[,2], h = 12)
plot(beer.fit.ses1, plot.conf=FALSE, type="o", main = "Monthly Beer Production in Australia", xlab = "Year", ylab = "ML")
lines(beer.fit.ses1$fitted, col = "blue", type="o")
lines(beer.fit.ses2$fitted, col = "green", type="o")
lines(beer.fit.ses3$fitted, col = "red", type="o")
lines(beer.fit.ses1$mean, col = "blue", type="o")
lines(beer.fit.ses2$mean, col = "green", type="o")
lines(beer.fit.ses3$mean, col = "red", type="o")
legend("topleft",lty=1, col=c(1,"blue","green","red"), 
       c("data", expression(alpha == 0.2), expression(alpha == 0.6),
         expression(alpha == 0.87)), pch = 1)

Simple exponential smoothing only conside level. so, we now will use another model that can capture trend as well.

Holt’s Linear Trend Method

In the Holt’s linear trend models, there are 3 variations.

beer.ts.yrl <- aggregate(beer.ts, nfrequency=1)
beer.ts.yrl <- window(beer.ts.yrl , start = 1960, end = 1975)
beer.fit.holt1 <- holt(beer.ts.yrl[,2], alpha = 0.2, beta = 0.2, initial = "simple", h = 6)
beer.fit.holt2 <- holt(beer.ts.yrl[,2], alpha = 0.2, beta = 0.2, initial = "simple", exponential = TRUE, h = 6)
beer.fit.holt3 <- holt(beer.ts.yrl[,2], alpha = 0.2, beta = 0.2, initial = "simple", damped = TRUE, h = 6)
## Warning in holt(beer.ts.yrl[, 2], alpha = 0.2, beta = 0.2, initial =
## "simple", : Damped Holt's method requires optimal initialization
plot(beer.fit.holt1, type="o", fcol="white", main = "Yearly Beer Production in Australia", xlab = "Year", ylab = "ML", plot.conf=FALSE)
lines(beer.fit.holt1$fitted, col = "blue")
lines(beer.fit.holt2$fitted, col = "green")
lines(beer.fit.holt3$fitted, col = "red")
lines(beer.fit.holt1$mean, col = "blue", type="o")
lines(beer.fit.holt2$mean, col = "green", type="o")
lines(beer.fit.holt3$mean, col = "red", type="o")
legend("topleft", lty = 1, col = c("black", "blue", "green", "red"), 
       c("Data", "Holt's Linear Trend", "Exponential Trend", "Damped Trend"))

Holt-Winters’ Seasonal Trend

This is an extention from Holt’s model by adding seasonal parameter to capture seasonality

beer.ts3 <- window(beer.ts, start = 1960, end = 1975)
beer.ts.qtr <- aggregate(beer.ts3, nfrequency=4)
beer.fit.hw1 <- hw(beer.ts.qtr[,2], h = 20, seasonal = "additive")
beer.fit.hw2 <- hw(beer.ts.qtr[,2], h = 20, seasonal = "multiplicative")

plot(beer.fit.hw1, type="o", fcol="white", main = "Quarterly Beer Production in Australia", xlab = "Year", ylab = "ML", plot.conf=FALSE)
lines(beer.fit.hw1$fitted, col = "blue", lty=2)
lines(beer.fit.hw2$fitted, col = "red", lty=2)
lines(beer.fit.hw1$mean, col = "blue", type="o")
lines(beer.fit.hw2$mean, col = "red", type="o")
legend("topleft", lty = 1, pch = 1, col = c("black", "blue", "red"),
       c("Data", "Holt Winters' Additive", "Holt Winters' Multiplicative"))

ETS

People often feel overwhelmed by which model to use since there are 30 different models to choose from. Luckly, the ETS function in R helps us to estimate the best model which is the one with least AIC score

beer.fit.ets <- ets(beer.ts.qtr[,2])
plot(forecast((beer.fit.ets), h = 8), xlab = "Year", ylab = "ML")

summary(beer.fit.ets)
## ETS(M,A,M) 
## 
## Call:
##  ets(y = beer.ts.qtr[, 2]) 
## 
##   Smoothing parameters:
##     alpha = 0.1687 
##     beta  = 0.0378 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 267.3213 
##     b = 1.5791 
##     s=1.1716 0.9272 0.8793 1.0219
## 
##   sigma:  0.0303
## 
##      AIC     AICc      BIC 
## 544.4868 547.3103 561.2415 
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 2.174717 11.07211 8.416933 0.5127013 2.354354 0.5204242
##                    ACF1
## Training set -0.1910966

So we can see that ETS estimate ETS(M,A,M) for us which is multiplicative errors with Multiplicative Holt-Winters’ method. We can also fit our own model and compare to the estimate model

beer.fit.ets2 <- ets(beer.ts.qtr[,2], model = "MAN")
plot(forecast((beer.fit.ets2), h = 8), xlab = "Year", ylab = "ML")

summary(beer.fit.ets2)
## ETS(M,A,N) 
## 
## Call:
##  ets(y = beer.ts.qtr[, 2], model = "MAN") 
## 
##   Smoothing parameters:
##     alpha = 0.0117 
##     beta  = 0.0117 
## 
##   Initial states:
##     l = 267.2152 
##     b = 2.5178 
## 
##   sigma:  0.119
## 
##      AIC     AICc      BIC 
## 700.7876 701.5149 709.1650 
## 
## Training set error measures:
##                   ME     RMSE      MAE         MPE     MAPE     MASE
## Training set 4.73209 42.46602 35.12016 -0.09051917 9.693608 2.171501
##                     ACF1
## Training set -0.06330824

It is obvious that Holt’t trend only model will not do as well as seaaonality model given the highly seasonal trend in our data. It can be proved by looking at the AIC for these two models. AIC for Holt’s trend model is 700.7876 compared to 544.4868 from estimate model.

ARIMA

Autocorrelation

Autoccorrelation is basically to see if the consecutive observations are correlated. If there is correlation, we can run a regression using current observation as depending variable and previous observation as indenpending variable.

There a couple of ways to help to see if there is correlation between lags.

head(cbind(beer.ts[-1,2], beer.ts[-468,2]))
##      [,1] [,2]
## [1,] 96.0 93.2
## [2,] 95.2 96.0
## [3,] 77.1 95.2
## [4,] 70.9 77.1
## [5,] 64.8 70.9
## [6,] 70.1 64.8
plot(beer.ts[-468,2], beer.ts[-1,2], main = "Scatterplot - Lag 1", xlab = "Lag 0", ylab = "Lag 1")

cor(beer.ts[-468,2], beer.ts[-1,2])
## [1] 0.8344668

As we can see there is correlation of 0.8 which means there is strong relationship between consecutive observations.

lag.plot(beer.ts[,2], lags = 9, do.lines = FALSE)

We can see the correlation is declining from lag 1 to lag 7 which makes sense since the correlation normally diminished by time. We can also see seasonality affect here since the correlation is worse from lag 5 to lag 8 than from lag 9.

Another very usful tool to look at the correlation from lag is acf function. As we can see that there are very strong relationship from lag 1 to lag 9 and we can also see the seasonal effect as well.

acf(beer.ts[,2])

Differencing

Differencing computes the differences between consecutive observations. By differencing the time series data, we can remove the trend and seasonality.

d_beer <- diff(log(beer.ts.qtr[,2]))
plot(d_beer, main = "Differencing logged Quarterly Beer Production")

dd_beer <- diff(d_beer, lag = 4)
plot(dd_beer, main = "Differencing the difference logged Quarterly Beer Production")

The first differencing removed the trend but we can still see some seasonality affect. By doing another differencing with lag 12, we now removed the seasonality. Time series now should be now stationary. We can also see that from ACF charts

acf2(d_beer)

##         ACF  PACF
##  [1,] -0.11 -0.11
##  [2,] -0.77 -0.79
##  [3,] -0.06 -0.79
##  [4,]  0.88  0.18
##  [5,] -0.07 -0.01
##  [6,] -0.71  0.11
##  [7,] -0.05  0.00
##  [8,]  0.79  0.08
##  [9,] -0.05  0.03
## [10,] -0.66  0.07
## [11,] -0.06 -0.06
## [12,]  0.73  0.06
## [13,] -0.04 -0.07
## [14,] -0.60  0.04
## [15,] -0.06 -0.03
## [16,]  0.67 -0.01
## [17,] -0.03 -0.03
## [18,] -0.56 -0.06
acf2(dd_beer)

##         ACF  PACF
##  [1,] -0.58 -0.58
##  [2,] -0.04 -0.57
##  [3,]  0.42  0.10
##  [4,] -0.39  0.03
##  [5,]  0.06 -0.13
##  [6,]  0.17 -0.14
##  [7,] -0.16  0.04
##  [8,] -0.04 -0.10
##  [9,]  0.13 -0.14
## [10,] -0.15 -0.23
## [11,]  0.10  0.04
## [12,]  0.00  0.04
## [13,] -0.10 -0.12
## [14,]  0.17  0.01
## [15,] -0.10  0.14
## [16,] -0.06 -0.04
## [17,]  0.18 -0.05
## [18,] -0.15 -0.03

ARIMA Model Selection

By analyzing the ACF and PACF charts after the differencing, we can try to fit the model by choosing p,d,q,P,D,Q,S parameters in ARIMA(p,d,q)(P,D,Q)S function.

We now can try to pick the best p,d,q,P,D,Q,S parameters according ACF & PACF charts.

sarima(beer.ts.qtr[,2], 2, 1, 0, 2, 1, 2, 4)
## initial  value 3.073841 
## iter   2 value 2.628819
## iter   3 value 2.572596
## iter   4 value 2.551486
## iter   5 value 2.438811
## iter   6 value 2.435508
## iter   7 value 2.434934
## iter   8 value 2.434246
## iter   9 value 2.434191
## iter  10 value 2.434144
## iter  11 value 2.434093
## iter  12 value 2.434065
## iter  13 value 2.434001
## iter  14 value 2.433882
## iter  15 value 2.433843
## iter  16 value 2.433790
## iter  17 value 2.433756
## iter  18 value 2.433738
## iter  19 value 2.433732
## iter  20 value 2.433715
## iter  21 value 2.433686
## iter  22 value 2.433650
## iter  23 value 2.433627
## iter  24 value 2.433621
## iter  25 value 2.433621
## iter  25 value 2.433621
## iter  25 value 2.433621
## final  value 2.433621 
## converged
## initial  value 2.481238 
## iter   2 value 2.468996
## iter   3 value 2.463620
## iter   4 value 2.460036
## iter   5 value 2.458727
## iter   6 value 2.458621
## iter   7 value 2.458382
## iter   8 value 2.457405
## iter   9 value 2.453468
## iter  10 value 2.452171
## iter  11 value 2.450947
## iter  12 value 2.449924
## iter  13 value 2.447893
## iter  14 value 2.442364
## iter  15 value 2.441109
## iter  16 value 2.439864
## iter  17 value 2.438823
## iter  18 value 2.438209
## iter  19 value 2.438081
## iter  20 value 2.437914
## iter  21 value 2.437815
## iter  22 value 2.437789
## iter  23 value 2.437787
## iter  24 value 2.437786
## iter  25 value 2.437785
## iter  26 value 2.437785
## iter  27 value 2.437784
## iter  28 value 2.437784
## iter  29 value 2.437784
## iter  30 value 2.437784
## iter  30 value 2.437783
## iter  30 value 2.437783
## final  value 2.437783 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), include.mean = !no.constant, optim.control = list(trace = trc, 
##     REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ar1      ar2     sar1     sar2    sma1     sma2
##       -0.9929  -0.6617  -0.6507  -0.0927  0.4335  -0.5016
## s.e.   0.1280   0.1245   0.3155   0.2752  0.4704   0.3342
## 
## sigma^2 estimated as 115.5:  log likelihood = -212.12,  aic = 438.24
## 
## $degrees_of_freedom
## [1] 54
## 
## $ttable
##      Estimate     SE t.value p.value
## ar1   -0.9929 0.1280 -7.7560  0.0000
## ar2   -0.6617 0.1245 -5.3162  0.0000
## sar1  -0.6507 0.3155 -2.0622  0.0440
## sar2  -0.0927 0.2752 -0.3371  0.7374
## sma1   0.4335 0.4704  0.9215  0.3609
## sma2  -0.5016 0.3342 -1.5007  0.1393
## 
## $AIC
## [1] 5.949558
## 
## $AICc
## [1] 6.018789
## 
## $BIC
## [1] 5.158992

By looking at the residual diagnostics, it looks like we have a workable model here since the residuals seem normally distributed, ACF of the residuals are within 95% confidence interval, and p-values are above 0.05. Therefore, we can now use this ARIMA model to forecast.

beer.fit.arima <- arima(beer.ts.qtr[,2], order = c(2, 1, 0), seasonal = c(2, 1, 2))
fcst.arima <- forecast(beer.fit.arima, h = 20)
plot(fcst.arima)

Auto.ARIMA

Just like ETS, there is also a function auto.arima in R that will select the best estimated ARIMA model for us (the one with least AIC score)

beer.fit.arima.auto <- auto.arima(beer.ts.qtr[,2])
beer.fit.arima.auto
## Series: beer.ts.qtr[, 2] 
## ARIMA(0,1,2)(0,1,1)[4]                    
## 
## Coefficients:
##           ma1     ma2     sma1
##       -1.1509  0.6521  -0.6856
## s.e.   0.1306  0.1965   0.1013
## 
## sigma^2 estimated as 145.2:  log likelihood=-215.04
## AIC=438.08   AICc=438.88   BIC=446.11
fcst.auto <- forecast(beer.fit.arima.auto, h = 20)
plot(fcst.auto)

So we now compare these two ARIMA models. As we can see that they produced very similar results.

plot(beer.ts.qtr[,2], main = "Quarterly Beer Production in Australia", xlab = "Year", ylab = "ML", xlim = c(1960, 1980), ylim = c(200, 700))
lines(fcst.auto$mean, col = "blue", type = "o")
lines(fcst.arima$mean, col = "red", type = "o")
legend("topleft", lty = 1, pch = 1, col = c("blue", "red"),
       c("ARIMA(0,1,2)(0,1,1)4", "ARIMA(2,1,0)(2,1,2)4"))

Neural Network Autoregression

Neural nework is probably one of the hottest machine learning algorithms which models human brain and neural system. The lagged value can be used as inputs to a neural network similar to autoregression model

beer.fit.nn <- nnetar(beer.ts.qtr[,2])
plot(forecast(beer.fit.nn, h = 20), xlab = "Year", ylab = "ML", xlim = c(1960, 1980), ylim = c(200, 700))

Evaluating Forecast Accuracy

Here are some common measurements used to evaluate the forecast accracy.

Compare forecast accuracy among 4 basic forecasting models (simple average, naive method, seasonal naive, drifted model)

beer.ts.ac <- window(beer.ts, start = 1960, end = 1990)
beer.fit.a <- meanf(beer.ts.ac[,2], h = 60)
beer.fit.n <- naive(beer.ts.ac[,2], h = 60)
beer.fit.sn <- snaive(beer.ts.ac[,2], h = 60)
beer.fit.dri <- rwf(beer.ts.ac[,2], h = 60, drift = TRUE)
plot.ts(beer.ts[,2], main = "Monthly Beer Production in Australia", xlab = "Year", ylab = "ML", xlim = c(1960, 1994))
lines(beer.fit.a$mean, col = "blue")
lines(beer.fit.n$mean, col = "yellow4")
lines(beer.fit.dri$mean, col = "seagreen4")
lines(beer.fit.sn$mean, col = "red")
legend("topleft",lty=1,col=c("blue","yellow4","seagreen4", "red"), cex = 0.75,
       legend=c("Mean method","Naive method","Drift Naive method", "Seasonal naive method"))

Based on the graph and the understanding of the dataset, we should expect seasonl naive model is the best model.

accuracy(beer.fit.a, beer.ts)
##                        ME      RMSE       MAE       MPE     MAPE     MASE
## Training set 3.415550e-15  31.87576  25.94337 -5.944666 20.63945 2.554952
## Test set     2.988609e+02 299.34572 298.86094 68.029497 68.02950       NA
##                   ACF1 Theil's U
## Training set 0.7953819        NA
## Test set     0.9491525        NA
accuracy(beer.fit.n, beer.ts)
##                       ME      RMSE      MAE        MPE     MAPE     MASE
## Training set   0.2080556  20.22375  15.4775 -0.8056724 11.00421 1.524253
## Test set     267.0000000 267.54252 267.0000 60.7609288 60.76093       NA
##                    ACF1 Theil's U
## Training set -0.2588384        NA
## Test set      0.9491525        NA
accuracy(beer.fit.sn, beer.ts)
##                      ME      RMSE       MAE       MPE      MAPE MASE
## Training set   2.455014  13.50092  10.15415  1.590586  7.050483    1
## Test set     277.406780 278.48747 277.40678 63.154592 63.154592   NA
##                    ACF1 Theil's U
## Training set -0.1172677        NA
## Test set      0.6235879        NA
accuracy(beer.fit.dri, beer.ts)
##                        ME      RMSE      MAE        MPE     MAPE     MASE
## Training set 4.262953e-15  20.22268  15.4695 -0.9628034 11.00613 1.523465
## Test set     2.607583e+02 261.10685 260.7583 59.3683841 59.36838       NA
##                    ACF1 Theil's U
## Training set -0.2588384        NA
## Test set      0.9491525        NA

As we can see that seasonl naive model does have the lowest scores.