Use time series forecasting models in R to analyze Australia beer production data.
Summary
Time Series Plots
Data Transformation and Adjustment
Decomposition
Simple Average
Naive Method
Time Series Linear Regression
Simple Exponential Smoothing
Holt’s Linear Trend
Holt-Winters’ Seasonal Trend
ETS
ARIMA
Neural Network Autoregression
Forecast Accuracy Evaluation
Load packages
library(fpp)
library(astsa)
library(DT)
library(dygraphs)
Load Australia beer production dataset
beer <- read.csv("monthly-beer-production-in-austr.csv")
head(beer)
## Month Monthly.beer.production.in.Australia
## 1 1956-01 93.2
## 2 1956-02 96.0
## 3 1956-03 95.2
## 4 1956-04 77.1
## 5 1956-05 70.9
## 6 1956-06 64.8
tail(beer)
## Month Monthly.beer.production.in.Australia
## 471 1995-03 152
## 472 1995-04 127
## 473 1995-05 151
## 474 1995-06 130
## 475 1995-07 119
## 476 1995-08 153
Convert data frame into time serise object
beer.ts <- ts(beer, frequency = 12, start = c(1956,1), end = c(1994,12))
The first thing to do with any time seriese analysis is to plot the charts.
It is also a good idea to aggregate monthly production volume into quarterly and yearly volume. Depending on the business questions we try to answer, different time scales can be very useful.
beer.ts.qtr <- aggregate(beer.ts, nfrequency=4)
beer.ts.yr <- aggregate(beer.ts, nfrequency=1)
plot.ts(beer.ts[,2], main = "Monthly Beer Production in Australia", xlab = "Year", ylab = "ML")
plot.ts(beer.ts.qtr[,2], main = "Quarterly Beer Production in Australia", xlab = "Year", ylab = "ML")
plot.ts(beer.ts.yr[,2], main = "Yearly Beer Production in Australia", xlab = "Year", ylab = "ML")
As we can see there was a strong growth from 1950 to 1975 then the production was slowing declining with higher volatility. We can also see very strong seasonality which is obvious for the product like beer.
Next step we want to take a look at seasonality in more detail.
seasonplot(beer.ts[,2], year.labels = TRUE, year.labels.left=TRUE, col=1:40, pch=19, main = "Monthly Beer Production in Australia - seasonplot", xlab = "Month", ylab = "ML")
monthplot(beer.ts[,2], main = "Monthly Beer Production in Australia - monthplot", xlab = "Month", ylab = "ML")
boxplot(beer.ts[,2] ~ cycle(beer.ts[,2]), xlab = "Month", ylab = "ML", main = "Monthly Beer Production in Australia - Boxplot")
3 different charts all give us the level among different months, but also the range and variation. We can also see the variation and range within the same month.
Next, we will look at the trend more closely. Moving average smoothing is a great way to exame the trend. By adjusting how many months to calculate the average, we can control the smoothness of the trend line
par(mfrow = c(2,2))
plot(beer.ts[,2], col="gray", main = "1 Year Moving Average Smoothing")
lines(ma(beer.ts[,2], order = 12), col = "red", lwd=3)
plot(beer.ts[,2], col="gray", main = "3 Year Moving Average Smoothing")
lines(ma(beer.ts[,2], order = 36), col = "blue", lwd=3)
plot(beer.ts[,2], col="gray", main = "5 Year Moving Average Smoothing")
lines(ma(beer.ts[,2], order = 60), col = "green", lwd=3)
plot(beer.ts[,2], col="gray", main = "10 Year Moving Average Smoothing")
lines(ma(beer.ts[,2], order = 120), col = "yellow4", lwd=3)
Sometimes, adjusting and transformating data can make the historical data less complex so a simpler forecast model can be used. It is also a good idea to remove the underline factors affected the time series like workday, inflation, population, and currancy exchange rate…ect.
Calendar Adjustment
Sometimes, the variances in monthly data is simple due to the different number of days in each month. If we look at the monthly beer production data, not only the number of days in each month will impact the production valume, but number of working days will also have a significant impact as well.
plot.ts(beer.ts[,2], main = "Monthly Beer Production in Australia", xlab = "Year", ylab = "ML")
plot.ts(beer.ts[,2]/monthdays(beer.ts[,2]), main = "Monthly Beer Production in Australia - Adjusted By Calendar Days", xlab = "Year", ylab = "ML")
Population Adjustment
Time series data that is heavily impact by population growth is good to have the affect removed. Without doing it, it might mislead the true underlying factors you are interested in. In this case, if we really want to get an idea on how the beer industry was really doing in Australia, we might want remove the production growth that is simply due to the growth of the population in the country especailly when dataset goes way back to 1950.
pop <- read.csv("AUS_POP.csv")
plot.ts(beer.ts.yr[,2], main = "Yearly Beer Production in Australia", xlab = "Year", ylab = "ML")
plot(pop[7:45,], main = "Australia Popuation", xlab = "Year", ylab = "Number of People in 1,000", type = "l")
plot.ts(beer.ts.yr[,2]/pop[7:45,2], main = "Yearly Beer Production in Australia Adjusted by Population", xlab = "Year", ylab = "ML")
We can see that it gave us a really differet picture after removing the affect of population growth. From 1950 to 1975, we still see a strong growth. However, after 1980, a slow decline was actaully a very steep decline after the population factor was removed.
Logarithm transformation
Logarithm transformation is most used mathematical transformation not only in time series but any other regression models or data visualization. In time series analysis, by applying log transformation, it will stablize the variance. Hence transform exponential trend line into a linear line. It is used before differencing data to improve stationarity.
plot.ts(beer.ts[,2], main = "Monthly Beer Production in Australia", xlab = "Year", ylab = "ML")
plot.ts(log(beer.ts[,2]), main = "Log Transformated Monthly Beer Production in Australia", xlab = "Year", ylab = "ML")
After just looking at the different plots, we normally can get a good sense on how the time series behaves and the different components within the data. Decomposition is a tool that we can seperate different components in a time series data so we can see trend, seasonility, and random noises individually.
STL Decomposition
plot(stl(beer.ts[,2], s.window="periodic"))
From this chart, we can see that the seasonaliy is strong but consistant. The trend is similar to what we saw when we aggreciated the data into yearly which is that from 1950 to 1975, there was a strong growth and after that, the production slowly went down. Also, the noises went up started from 1975 as well.
Simple Average - simple average of all data points
Naive Method - the last observation value
Seasonal Navie - the last observation value from previous seasonal cycle
Drift Method - forecast value increase or decrease over time based on average change in historical data
beer.fit.a <- meanf(beer.ts[,2], h = 120)
beer.fit.n <- naive(beer.ts[,2], h = 120)
beer.fit.sn <- snaive(beer.ts[,2], h = 120)
beer.fit.dri <- rwf(beer.ts[,2], h = 120, drift = TRUE)
plot.ts(beer.ts[,2], main = "Monthly Beer Production in Australia", xlab = "Year", ylab = "ML", xlim = c(1954, 2004))
lines(beer.fit.a$mean, col = "blue")
lines(beer.fit.n$mean, col = "yellow4")
lines(beer.fit.dri$mean, col = "seagreen4")
lines(beer.fit.sn$mean, col = "red")
legend("topleft",lty=1,col=c("blue","yellow4","seagreen4", "red"), cex = 0.75,
legend=c("Mean method","Naive method","Drift Naive method", "Seasonal naive method"))
In this case, all of them are not good models because of the combination of trend and seasonality. However, if you have a less complex time series, these models might be good enough. Plus, these models can also be treated as benchmark when we move to more complex models.
Forecast with Linear Regression
beer.fit.lm <- tslm(beer.ts[,2] ~ trend)
f <- forecast(beer.fit.lm, h = 120, level = c(80,95))
plot.ts(beer.ts[,2], main = "Monthly Beer Production in Australia", xlab = "Year", ylab = "ML", xlim = c(1954,2004))
lines(f$fitted, col = "blue")
lines(f$mean, col = "blue")
plot(beer.fit.lm$fitted, beer.ts[,2], main = "Scatterplot between fitted & actual values", xlab = "Fitted Value", ylab = "Actual")
abline(0, 1, col="blue")
summary(beer.fit.lm)
##
## Call:
## tslm(formula = beer.ts[, 2] ~ trend)
##
## Residuals:
## Min 1Q Median 3Q Max
## -52.50 -16.92 -3.28 13.77 73.93
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 93.957034 2.188016 42.94 <2e-16 ***
## trend 0.180839 0.008085 22.37 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23.63 on 466 degrees of freedom
## Multiple R-squared: 0.5178, Adjusted R-squared: 0.5167
## F-statistic: 500.3 on 1 and 466 DF, p-value: < 2.2e-16
It did a good job on picking up the trend. However, we know that there is seasonality as well. So we will add another variable into the regression
Forecast with Linear Regression and Seasonality
beer.fit.lm2 <- tslm(beer.ts[,2] ~ trend + season)
summary(beer.fit.lm2)
##
## Call:
## tslm(formula = beer.ts[, 2] ~ trend + season)
##
## Residuals:
## Min 1Q Median 3Q Max
## -43.472 -12.449 -2.105 12.632 51.070
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 98.538349 3.119771 31.585 < 2e-16 ***
## trend 0.179376 0.005989 29.949 < 2e-16 ***
## season2 -9.066555 3.962776 -2.288 0.022599 *
## season3 2.120736 3.962790 0.535 0.592799
## season4 -11.579153 3.962812 -2.922 0.003652 **
## season5 -15.835452 3.962844 -3.996 7.51e-05 ***
## season6 -29.196879 3.962885 -7.368 8.24e-13 ***
## season7 -19.786511 3.962934 -4.993 8.49e-07 ***
## season8 -13.822297 3.962993 -3.488 0.000534 ***
## season9 -10.573467 3.963061 -2.668 0.007903 **
## season10 9.011260 3.963138 2.274 0.023445 *
## season11 18.037012 3.963224 4.551 6.86e-06 ***
## season12 29.831995 3.963319 7.527 2.81e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.5 on 455 degrees of freedom
## Multiple R-squared: 0.7418, Adjusted R-squared: 0.7349
## F-statistic: 108.9 on 12 and 455 DF, p-value: < 2.2e-16
plot.ts(beer.ts[,2], main = "Monthly Beer Production in Australia", xlab = "Year", ylab = "ML")
lines(beer.fit.lm2$fitted.values, col = "blue")
plot(beer.fit.lm2$fitted, beer.ts[,2], main = "Scatterplot between fitted & actual values", xlab = "Fitted Value", ylab = "Actual")
abline(0, 1, col="blue")
f <- forecast(beer.fit.lm2, h = 120, level = c(80,95))
plot.ts(beer.ts[,2], main = "Monthly Beer Production in Australia", xlab = "Year", ylab = "ML", xlim = c(1954,2004))
lines(f$mean, col = "blue")
Other Regression Models - Parabola
We can see the trend clearly not a straight line. Therefore, we can add a new input variable t^2 to create a parabola.
t <- seq(1956, 1995.2, length = length(beer.ts[,2]))
t2 <- t^2
beer.fit.lm3 <- tslm(beer.ts[,2] ~ t + t2)
plot.ts(beer.ts[,2], main = "Monthly Beer Production in Australia", xlab = "Year", ylab = "ML")
lines(beer.fit.lm3$fit, col = "blue")
sin.t <- sin(2*pi*t)
cos.t <- cos(2*pi*t)
beer.fit.lm4 <- tslm(beer.ts[,2] ~ t + t2 + sin.t + cos.t)
plot.ts(beer.ts[,2], main = "Monthly Beer Production in Australia", xlab = "Year", ylab = "ML")
lines(beer.fit.lm4$fit, col = "blue")
Exponential smoothing is basically taking weighted average of previoud observations. By adjusting alpha, beta, and gamma parameters, we can control the weights for level, trend, and seasonality.
Simple Exponential Smoothing simple exponential smoothing is used for the time seriese without trend and seasonality. So I only picked the time series from 1993 to 1994 so the analysis can make sense.
beer.ts2 <- window(beer.ts, start = 1993)
plot.ts(beer.ts2[,2], main = "Monthly Beer Production in Australia", xlab = "Year", ylab = "ML")
beer.fit.ses1 <- ses(beer.ts2[,2], alpha = 0.2, initial = "simple", h = 12)
beer.fit.ses2 <- ses(beer.ts2[,2], alpha = 0.6, initial = "simple", h = 12)
beer.fit.ses3 <- ses(beer.ts2[,2], h = 12)
plot(beer.fit.ses1, plot.conf=FALSE, type="o", main = "Monthly Beer Production in Australia", xlab = "Year", ylab = "ML")
lines(beer.fit.ses1$fitted, col = "blue", type="o")
lines(beer.fit.ses2$fitted, col = "green", type="o")
lines(beer.fit.ses3$fitted, col = "red", type="o")
lines(beer.fit.ses1$mean, col = "blue", type="o")
lines(beer.fit.ses2$mean, col = "green", type="o")
lines(beer.fit.ses3$mean, col = "red", type="o")
legend("topleft",lty=1, col=c(1,"blue","green","red"),
c("data", expression(alpha == 0.2), expression(alpha == 0.6),
expression(alpha == 0.87)), pch = 1)
Simple exponential smoothing only conside level. so, we now will use another model that can capture trend as well.
Holt’s Linear Trend Method
In the Holt’s linear trend models, there are 3 variations.
beer.ts.yrl <- aggregate(beer.ts, nfrequency=1)
beer.ts.yrl <- window(beer.ts.yrl , start = 1960, end = 1975)
beer.fit.holt1 <- holt(beer.ts.yrl[,2], alpha = 0.2, beta = 0.2, initial = "simple", h = 6)
beer.fit.holt2 <- holt(beer.ts.yrl[,2], alpha = 0.2, beta = 0.2, initial = "simple", exponential = TRUE, h = 6)
beer.fit.holt3 <- holt(beer.ts.yrl[,2], alpha = 0.2, beta = 0.2, initial = "simple", damped = TRUE, h = 6)
## Warning in holt(beer.ts.yrl[, 2], alpha = 0.2, beta = 0.2, initial =
## "simple", : Damped Holt's method requires optimal initialization
plot(beer.fit.holt1, type="o", fcol="white", main = "Yearly Beer Production in Australia", xlab = "Year", ylab = "ML", plot.conf=FALSE)
lines(beer.fit.holt1$fitted, col = "blue")
lines(beer.fit.holt2$fitted, col = "green")
lines(beer.fit.holt3$fitted, col = "red")
lines(beer.fit.holt1$mean, col = "blue", type="o")
lines(beer.fit.holt2$mean, col = "green", type="o")
lines(beer.fit.holt3$mean, col = "red", type="o")
legend("topleft", lty = 1, col = c("black", "blue", "green", "red"),
c("Data", "Holt's Linear Trend", "Exponential Trend", "Damped Trend"))
Holt-Winters’ Seasonal Trend
This is an extention from Holt’s model by adding seasonal parameter to capture seasonality
beer.ts3 <- window(beer.ts, start = 1960, end = 1975)
beer.ts.qtr <- aggregate(beer.ts3, nfrequency=4)
beer.fit.hw1 <- hw(beer.ts.qtr[,2], h = 20, seasonal = "additive")
beer.fit.hw2 <- hw(beer.ts.qtr[,2], h = 20, seasonal = "multiplicative")
plot(beer.fit.hw1, type="o", fcol="white", main = "Quarterly Beer Production in Australia", xlab = "Year", ylab = "ML", plot.conf=FALSE)
lines(beer.fit.hw1$fitted, col = "blue", lty=2)
lines(beer.fit.hw2$fitted, col = "red", lty=2)
lines(beer.fit.hw1$mean, col = "blue", type="o")
lines(beer.fit.hw2$mean, col = "red", type="o")
legend("topleft", lty = 1, pch = 1, col = c("black", "blue", "red"),
c("Data", "Holt Winters' Additive", "Holt Winters' Multiplicative"))
ETS
People often feel overwhelmed by which model to use since there are 30 different models to choose from. Luckly, the ETS function in R helps us to estimate the best model which is the one with least AIC score
beer.fit.ets <- ets(beer.ts.qtr[,2])
plot(forecast((beer.fit.ets), h = 8), xlab = "Year", ylab = "ML")
summary(beer.fit.ets)
## ETS(M,A,M)
##
## Call:
## ets(y = beer.ts.qtr[, 2])
##
## Smoothing parameters:
## alpha = 0.1687
## beta = 0.0378
## gamma = 1e-04
##
## Initial states:
## l = 267.3213
## b = 1.5791
## s=1.1716 0.9272 0.8793 1.0219
##
## sigma: 0.0303
##
## AIC AICc BIC
## 544.4868 547.3103 561.2415
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 2.174717 11.07211 8.416933 0.5127013 2.354354 0.5204242
## ACF1
## Training set -0.1910966
So we can see that ETS estimate ETS(M,A,M) for us which is multiplicative errors with Multiplicative Holt-Winters’ method. We can also fit our own model and compare to the estimate model
beer.fit.ets2 <- ets(beer.ts.qtr[,2], model = "MAN")
plot(forecast((beer.fit.ets2), h = 8), xlab = "Year", ylab = "ML")
summary(beer.fit.ets2)
## ETS(M,A,N)
##
## Call:
## ets(y = beer.ts.qtr[, 2], model = "MAN")
##
## Smoothing parameters:
## alpha = 0.0117
## beta = 0.0117
##
## Initial states:
## l = 267.2152
## b = 2.5178
##
## sigma: 0.119
##
## AIC AICc BIC
## 700.7876 701.5149 709.1650
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 4.73209 42.46602 35.12016 -0.09051917 9.693608 2.171501
## ACF1
## Training set -0.06330824
It is obvious that Holt’t trend only model will not do as well as seaaonality model given the highly seasonal trend in our data. It can be proved by looking at the AIC for these two models. AIC for Holt’s trend model is 700.7876 compared to 544.4868 from estimate model.
Autocorrelation
Autoccorrelation is basically to see if the consecutive observations are correlated. If there is correlation, we can run a regression using current observation as depending variable and previous observation as indenpending variable.
There a couple of ways to help to see if there is correlation between lags.
head(cbind(beer.ts[-1,2], beer.ts[-468,2]))
## [,1] [,2]
## [1,] 96.0 93.2
## [2,] 95.2 96.0
## [3,] 77.1 95.2
## [4,] 70.9 77.1
## [5,] 64.8 70.9
## [6,] 70.1 64.8
plot(beer.ts[-468,2], beer.ts[-1,2], main = "Scatterplot - Lag 1", xlab = "Lag 0", ylab = "Lag 1")
cor(beer.ts[-468,2], beer.ts[-1,2])
## [1] 0.8344668
As we can see there is correlation of 0.8 which means there is strong relationship between consecutive observations.
lag.plot(beer.ts[,2], lags = 9, do.lines = FALSE)
We can see the correlation is declining from lag 1 to lag 7 which makes sense since the correlation normally diminished by time. We can also see seasonality affect here since the correlation is worse from lag 5 to lag 8 than from lag 9.
Another very usful tool to look at the correlation from lag is acf function. As we can see that there are very strong relationship from lag 1 to lag 9 and we can also see the seasonal effect as well.
acf(beer.ts[,2])
Differencing
Differencing computes the differences between consecutive observations. By differencing the time series data, we can remove the trend and seasonality.
d_beer <- diff(log(beer.ts.qtr[,2]))
plot(d_beer, main = "Differencing logged Quarterly Beer Production")
dd_beer <- diff(d_beer, lag = 4)
plot(dd_beer, main = "Differencing the difference logged Quarterly Beer Production")
The first differencing removed the trend but we can still see some seasonality affect. By doing another differencing with lag 12, we now removed the seasonality. Time series now should be now stationary. We can also see that from ACF charts
acf2(d_beer)
## ACF PACF
## [1,] -0.11 -0.11
## [2,] -0.77 -0.79
## [3,] -0.06 -0.79
## [4,] 0.88 0.18
## [5,] -0.07 -0.01
## [6,] -0.71 0.11
## [7,] -0.05 0.00
## [8,] 0.79 0.08
## [9,] -0.05 0.03
## [10,] -0.66 0.07
## [11,] -0.06 -0.06
## [12,] 0.73 0.06
## [13,] -0.04 -0.07
## [14,] -0.60 0.04
## [15,] -0.06 -0.03
## [16,] 0.67 -0.01
## [17,] -0.03 -0.03
## [18,] -0.56 -0.06
acf2(dd_beer)
## ACF PACF
## [1,] -0.58 -0.58
## [2,] -0.04 -0.57
## [3,] 0.42 0.10
## [4,] -0.39 0.03
## [5,] 0.06 -0.13
## [6,] 0.17 -0.14
## [7,] -0.16 0.04
## [8,] -0.04 -0.10
## [9,] 0.13 -0.14
## [10,] -0.15 -0.23
## [11,] 0.10 0.04
## [12,] 0.00 0.04
## [13,] -0.10 -0.12
## [14,] 0.17 0.01
## [15,] -0.10 0.14
## [16,] -0.06 -0.04
## [17,] 0.18 -0.05
## [18,] -0.15 -0.03
ARIMA Model Selection
By analyzing the ACF and PACF charts after the differencing, we can try to fit the model by choosing p,d,q,P,D,Q,S parameters in ARIMA(p,d,q)(P,D,Q)S function.
We now can try to pick the best p,d,q,P,D,Q,S parameters according ACF & PACF charts.
sarima(beer.ts.qtr[,2], 2, 1, 0, 2, 1, 2, 4)
## initial value 3.073841
## iter 2 value 2.628819
## iter 3 value 2.572596
## iter 4 value 2.551486
## iter 5 value 2.438811
## iter 6 value 2.435508
## iter 7 value 2.434934
## iter 8 value 2.434246
## iter 9 value 2.434191
## iter 10 value 2.434144
## iter 11 value 2.434093
## iter 12 value 2.434065
## iter 13 value 2.434001
## iter 14 value 2.433882
## iter 15 value 2.433843
## iter 16 value 2.433790
## iter 17 value 2.433756
## iter 18 value 2.433738
## iter 19 value 2.433732
## iter 20 value 2.433715
## iter 21 value 2.433686
## iter 22 value 2.433650
## iter 23 value 2.433627
## iter 24 value 2.433621
## iter 25 value 2.433621
## iter 25 value 2.433621
## iter 25 value 2.433621
## final value 2.433621
## converged
## initial value 2.481238
## iter 2 value 2.468996
## iter 3 value 2.463620
## iter 4 value 2.460036
## iter 5 value 2.458727
## iter 6 value 2.458621
## iter 7 value 2.458382
## iter 8 value 2.457405
## iter 9 value 2.453468
## iter 10 value 2.452171
## iter 11 value 2.450947
## iter 12 value 2.449924
## iter 13 value 2.447893
## iter 14 value 2.442364
## iter 15 value 2.441109
## iter 16 value 2.439864
## iter 17 value 2.438823
## iter 18 value 2.438209
## iter 19 value 2.438081
## iter 20 value 2.437914
## iter 21 value 2.437815
## iter 22 value 2.437789
## iter 23 value 2.437787
## iter 24 value 2.437786
## iter 25 value 2.437785
## iter 26 value 2.437785
## iter 27 value 2.437784
## iter 28 value 2.437784
## iter 29 value 2.437784
## iter 30 value 2.437784
## iter 30 value 2.437783
## iter 30 value 2.437783
## final value 2.437783
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), include.mean = !no.constant, optim.control = list(trace = trc,
## REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ar2 sar1 sar2 sma1 sma2
## -0.9929 -0.6617 -0.6507 -0.0927 0.4335 -0.5016
## s.e. 0.1280 0.1245 0.3155 0.2752 0.4704 0.3342
##
## sigma^2 estimated as 115.5: log likelihood = -212.12, aic = 438.24
##
## $degrees_of_freedom
## [1] 54
##
## $ttable
## Estimate SE t.value p.value
## ar1 -0.9929 0.1280 -7.7560 0.0000
## ar2 -0.6617 0.1245 -5.3162 0.0000
## sar1 -0.6507 0.3155 -2.0622 0.0440
## sar2 -0.0927 0.2752 -0.3371 0.7374
## sma1 0.4335 0.4704 0.9215 0.3609
## sma2 -0.5016 0.3342 -1.5007 0.1393
##
## $AIC
## [1] 5.949558
##
## $AICc
## [1] 6.018789
##
## $BIC
## [1] 5.158992
By looking at the residual diagnostics, it looks like we have a workable model here since the residuals seem normally distributed, ACF of the residuals are within 95% confidence interval, and p-values are above 0.05. Therefore, we can now use this ARIMA model to forecast.
beer.fit.arima <- arima(beer.ts.qtr[,2], order = c(2, 1, 0), seasonal = c(2, 1, 2))
fcst.arima <- forecast(beer.fit.arima, h = 20)
plot(fcst.arima)
Auto.ARIMA
Just like ETS, there is also a function auto.arima in R that will select the best estimated ARIMA model for us (the one with least AIC score)
beer.fit.arima.auto <- auto.arima(beer.ts.qtr[,2])
beer.fit.arima.auto
## Series: beer.ts.qtr[, 2]
## ARIMA(0,1,2)(0,1,1)[4]
##
## Coefficients:
## ma1 ma2 sma1
## -1.1509 0.6521 -0.6856
## s.e. 0.1306 0.1965 0.1013
##
## sigma^2 estimated as 145.2: log likelihood=-215.04
## AIC=438.08 AICc=438.88 BIC=446.11
fcst.auto <- forecast(beer.fit.arima.auto, h = 20)
plot(fcst.auto)
So we now compare these two ARIMA models. As we can see that they produced very similar results.
plot(beer.ts.qtr[,2], main = "Quarterly Beer Production in Australia", xlab = "Year", ylab = "ML", xlim = c(1960, 1980), ylim = c(200, 700))
lines(fcst.auto$mean, col = "blue", type = "o")
lines(fcst.arima$mean, col = "red", type = "o")
legend("topleft", lty = 1, pch = 1, col = c("blue", "red"),
c("ARIMA(0,1,2)(0,1,1)4", "ARIMA(2,1,0)(2,1,2)4"))
Neural nework is probably one of the hottest machine learning algorithms which models human brain and neural system. The lagged value can be used as inputs to a neural network similar to autoregression model
beer.fit.nn <- nnetar(beer.ts.qtr[,2])
plot(forecast(beer.fit.nn, h = 20), xlab = "Year", ylab = "ML", xlim = c(1960, 1980), ylim = c(200, 700))
Here are some common measurements used to evaluate the forecast accracy.
Compare forecast accuracy among 4 basic forecasting models (simple average, naive method, seasonal naive, drifted model)
beer.ts.ac <- window(beer.ts, start = 1960, end = 1990)
beer.fit.a <- meanf(beer.ts.ac[,2], h = 60)
beer.fit.n <- naive(beer.ts.ac[,2], h = 60)
beer.fit.sn <- snaive(beer.ts.ac[,2], h = 60)
beer.fit.dri <- rwf(beer.ts.ac[,2], h = 60, drift = TRUE)
plot.ts(beer.ts[,2], main = "Monthly Beer Production in Australia", xlab = "Year", ylab = "ML", xlim = c(1960, 1994))
lines(beer.fit.a$mean, col = "blue")
lines(beer.fit.n$mean, col = "yellow4")
lines(beer.fit.dri$mean, col = "seagreen4")
lines(beer.fit.sn$mean, col = "red")
legend("topleft",lty=1,col=c("blue","yellow4","seagreen4", "red"), cex = 0.75,
legend=c("Mean method","Naive method","Drift Naive method", "Seasonal naive method"))
Based on the graph and the understanding of the dataset, we should expect seasonl naive model is the best model.
accuracy(beer.fit.a, beer.ts)
## ME RMSE MAE MPE MAPE MASE
## Training set 3.415550e-15 31.87576 25.94337 -5.944666 20.63945 2.554952
## Test set 2.988609e+02 299.34572 298.86094 68.029497 68.02950 NA
## ACF1 Theil's U
## Training set 0.7953819 NA
## Test set 0.9491525 NA
accuracy(beer.fit.n, beer.ts)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.2080556 20.22375 15.4775 -0.8056724 11.00421 1.524253
## Test set 267.0000000 267.54252 267.0000 60.7609288 60.76093 NA
## ACF1 Theil's U
## Training set -0.2588384 NA
## Test set 0.9491525 NA
accuracy(beer.fit.sn, beer.ts)
## ME RMSE MAE MPE MAPE MASE
## Training set 2.455014 13.50092 10.15415 1.590586 7.050483 1
## Test set 277.406780 278.48747 277.40678 63.154592 63.154592 NA
## ACF1 Theil's U
## Training set -0.1172677 NA
## Test set 0.6235879 NA
accuracy(beer.fit.dri, beer.ts)
## ME RMSE MAE MPE MAPE MASE
## Training set 4.262953e-15 20.22268 15.4695 -0.9628034 11.00613 1.523465
## Test set 2.607583e+02 261.10685 260.7583 59.3683841 59.36838 NA
## ACF1 Theil's U
## Training set -0.2588384 NA
## Test set 0.9491525 NA
As we can see that seasonl naive model does have the lowest scores.