Example

library(forecast)

## Warning: package 'forecast' was built under R version 3.4.4

library(quantmod)

## Warning: package 'quantmod' was built under R version 3.4.4

## Loading required package: xts

## Warning: package 'xts' was built under R version 3.4.4

## Loading required package: zoo

## Warning: package 'zoo' was built under R version 3.4.4

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Loading required package: TTR

## Warning: package 'TTR' was built under R version 3.4.3

## Version 0.4-0 included new data defaults. See ?getSymbols.

library(tseries)

## Warning: package 'tseries' was built under R version 3.4.4

library(timeSeries)

## Warning: package 'timeSeries' was built under R version 3.4.4

## Loading required package: timeDate

## Warning: package 'timeDate' was built under R version 3.4.4

## 
## Attaching package: 'timeSeries'

## The following object is masked from 'package:zoo':
## 
##     time<-

library(forecast)
library(xts)
data(wineind)

We are going to use a dataset of Australian total wine sales from January 1980 to August 1994.

Plot the Data

We begin by looking at a plot of our data.

wineind <- wineind[-c(1:24)]
wineind <- ts(wineind, frequency=12, start=c(1980,1))
par(mfrow=c(2,1))
plot.ts(wineind)

Because this plot seems to have pretty equal variance, we will not perform any transformations on it.

Differences

We will now look at what order of differences our model needs. We do this using the diff function with our data set and the lag as arguments. We set our lag equal to 12 to look back at monthly data:

diff1 <- diff(wineind, lag = 12)
plot.ts(diff1)

adf.test(diff1)

## Warning in adf.test(diff1): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff1
## Dickey-Fuller = -4.0358, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

We also ran adf.test to see if our data is stationary. Our test came back with a low p-value of .01, so we will accept the aternative hypothesis that this data is stationary and use d = D = 0.

Preliminarily Choose p, q, P, Q

To choose our values of p, q, P, and Q, we use the autocorrelation function (ACF) and partial autocorrelation function (PACF).

To determine our value of q, we look for spikes at low lags of our ACF graph. To determine our value of Q, we look for spikes at lags that are multiples of S in our ACF graph. To determine our value of p, we look for spikes at low lags of our PACF graph. To determine our value of P, we look for spikes at lags that are multiples of S in our PACF graph.

Acf(diff1)

Pacf(diff1)

Looking at our ACF graph, we do not have any spikes at low lags, so we determine that q = 0. We do have a spike around 12, however, so we will determine that Q = 1.

Looking at our PACF graph, we do not have any spikes at low lags, so we determine that p = 0. We do have a spike around 12, however, so we will determine that P = 1.

Estimate Our Model

This step is easy. We just create our model with our determind p, q, d, P, Q, D and S values, as follows:

mod1 <- arima(wineind, order = c(0,0,0), season = list( order = c( 1,1,1), period=12))

Examine Model/Model Diagnostics

We will now look at more specifics of the model that we have created. To run diagnostics on our models, we want to analyze the residuals. We can do this using a hypothesis test that uses the Ljung Box statistic.

If our ARIMA model is doing a good job, it should be accounting for the dependency of \(y_t\)s and the residuals should be small. Our Ljung Box Statistic will also then be small.

In this test, our null hypothesis is that our model is adequate. Our alternative hypothesis is that the model is inadequate (meaning our residuals are too big which creates a large test stat).

If we reject the null hypothesis, we go back to the drawing board and look at autocorrelation function and PACF to improve the model (find better p, d, q)

summary(mod1)

## 
## Call:
## arima(x = wineind, order = c(0, 0, 0), seasonal = list(order = c(1, 1, 1), period = 12))
## 
## Coefficients:
##         sar1     sma1
##       0.3807  -0.9305
## s.e.  0.1344   0.2266
## 
## sigma^2 estimated as 5796736:  log likelihood = -1295.62,  aic = 2597.23
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 421.0175 2310.659 1698.323 0.8636952 6.650661 0.3376614
##                    ACF1
## Training set 0.03255104

Box.test(mod1$residuals, type = "Ljung")

## 
##  Box-Ljung test
## 
## data:  mod1$residuals
## X-squared = 0.16425, df = 1, p-value = 0.6853

Based on the p-value of this test, our model is adequate, meaning our residuals are small enough. It looks like our sar1 is pretty close to 0, though, so we will try removing it:

mod2 <- arima(wineind, order = c(0,0,0), season = list( order = c( 0,1,1), period=12))
summary(mod2)

## 
## Call:
## arima(x = wineind, order = c(0, 0, 0), seasonal = list(order = c(0, 1, 1), period = 12))
## 
## Coefficients:
##          sma1
##       -0.5330
## s.e.   0.1232
## 
## sigma^2 estimated as 6486448:  log likelihood = -1298.62,  aic = 2601.25
## 
## Training set error measures:
##                    ME    RMSE      MAE       MPE     MAPE      MASE
## Training set 333.4408 2444.26 1821.747 0.5345291 7.190648 0.3622007
##                    ACF1
## Training set 0.04947764

Box.test(mod1$residuals, type = "Ljung")

## 
##  Box-Ljung test
## 
## data:  mod1$residuals
## X-squared = 0.16425, df = 1, p-value = 0.6853

This looks good also.

Out of curiosity, we will compare our results with the results of an automated process:

automod <- auto.arima(wineind)
automod

## Series: wineind 
## ARIMA(3,0,2)(0,1,1)[12] 
## 
## Coefficients:
##          ar1     ar2     ar3      ma1      ma2     sma1
##       0.1275  0.5764  0.2441  -0.1338  -0.6116  -0.6845
## s.e.  0.2519  0.2594  0.0916   0.2524   0.2305   0.0896
## 
## sigma^2 estimated as 5797471:  log likelihood=-1289.33
## AIC=2592.65   AICc=2593.5   BIC=2613.24

AIC(automod); AIC(mod2)

## [1] 2592.653

## [1] 2601.245

Box.test(automod$residuals)

## 
##  Box-Pierce test
## 
## data:  automod$residuals
## X-squared = 0.03366, df = 1, p-value = 0.8544

We were pretty close!

Forecasting

par(mfrow=c(2,1))
plot(forecast(automod, h = 12), main = "auto.arima")
plot(forecast(mod2, h = 12), main = "home-cooked")

Learning Log Day 23

Cara Christianson

April 26, 2018

ARIMA Models Continued