ARIMA Models

In Tuesday’s class, we started learning about ARIMA models. ARIMA stands for autoregressive integrated moving averages and is a combination of autoregressive, integrated, and moving average models.

We fit a few models in class and used them to do predictions.

We have to first call up packages so we can use ARIMA functions:

library(quantmod)
## Warning: package 'quantmod' was built under R version 3.4.4
## Loading required package: xts
## Warning: package 'xts' was built under R version 3.4.4
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.4
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: TTR
## Warning: package 'TTR' was built under R version 3.4.3
## Version 0.4-0 included new data defaults. See ?getSymbols.
library(tseries)
## Warning: package 'tseries' was built under R version 3.4.4
library(timeSeries)
## Warning: package 'timeSeries' was built under R version 3.4.4
## Loading required package: timeDate
## Warning: package 'timeDate' was built under R version 3.4.4
## 
## Attaching package: 'timeSeries'
## The following object is masked from 'package:zoo':
## 
##     time<-
library(forecast)
## Warning: package 'forecast' was built under R version 3.4.4
library(xts)

Autoregressive Models

We began by talking about autoregressive models, which is where we look at \(y_t\) as a linear combination of \[y_{(t-1)}\], …, \[y_{(t-p)}\]. Autoregressive models are models of order p and have the following form:

\[{y_t} = {\phi_1}{y_{(t-1)}} + {\phi_2}{y_{(t-2)}} + ... + {\phi_p}{y_{(t-p)}} + {\epsilon_t}\]

We say that \({y_t}\) is stationary, \({\epsilon_t}\) is normal with mean zero and variance sigma squared, \({\phi_1} ... {\phi_p}\) are constants, and \({\phi_p}\) does not equal 0.

We will first look at AR(1), which is an autoregressive model with p equal to 1. This means that \({y_t}\) depends on the previous 1 observation. We will create two models: one with \({\phi}\) equal to .9 and the other with \({\phi}\) equal to -.9. When it equals .9, this means that we have a positive correlation between \({y_t}\) and \({y_{t-1}}\). When it equals -.9, the sign of our correlation alternates between every other observation, which makes our graph look a bit more choppy.

We will create this model by simulation, so we use the arima.sim function. Our first model we set \({\phi}\) equal to .9 with the ‘ar’ argument, and we set it equal to -.9 in the second model. We want to create a model with p equal to 1, so we specify this in the first spot of of our order list, as follows:

par(mfrow=c(2,1))
# phi = .9
plot(arima.sim(list(order=c(1,0,0),ar=.9) , n=200), ylab = "y", main = (expression(AR(1) ~~~ phi == +.9)))
abline(0,0)
# phi = -.9
plot(arima.sim(list(order=c(1,0,0),ar= -.9) , n=200), ylab = "y", main = (expression(AR(1) ~~~ phi == -.9)))
abline(0,0)

We can see that the graph with negative correlation is more wavering, as we were expecting.

We will now simulate fake data and fit our model with the auto.arima function.

set.seed(2115)
fakedata <- arima.sim(list(order=c(1,0,0),ar=.9) , n=200)
plot(fakedata)

mod1 <- auto.arima(fakedata)
mod1  
## Series: fakedata 
## ARIMA(1,0,0) with zero mean 
## 
## Coefficients:
##          ar1
##       0.9266
## s.e.  0.0285
## 
## sigma^2 estimated as 1.004:  log likelihood=-284.66
## AIC=573.32   AICc=573.38   BIC=579.92

In the output, we are given a point estimate .9266, which we then use to create the following equation:

\[{y_t} = .9266{y_{t-1}} + {\epsilon_t}\]

We will forecast one day into the future. We specify that we want one day into to the future with ‘h = 1’

mycast1 <- forecast(mod1, h=1) # includes prediction intervals
plot(mycast1) 

We can also forecast further into the future. We will forecast 10 days into the future:

plot(forecast(mod1, h=10))

In this graph, we see our point predictions represented by the dark blue line. The darker blue region represents a smaller prediction interval and the lighter blue region is a wider prediction interval. The interval is very wide simply because predicting is very hard.

Moving Average Models

We will now move onto moving average models. In this type of model, \({y_t}\) is a linear combination of \({\epsilon_{t}, ..., {\epsilon_{t-q}}}\). The equation for this model is as follows:

\[{y_t} = {\mu} + {\epsilon_t} + {\theta_1}{\epsilon_{t-1}} + ... + {\theta_q}{\epsilon_{t-q}}\] In this equation, we assume that \({\epsilon_{t-q}}\) does not equal 0.

Similar to our AR(1) model, we simulate our MA(1) model:

par(mfrow = c(2,1))
plot(arima.sim(list(order=c(0,0,1), ma=.9), n = 200) , ylab = "y", main = (expression(MA(1) ~~~ theta ==  +.9)))
plot(arima.sim(list(order=c(0,0,1), ma=-.9), n = 200) , ylab = "y", main = (expression(MA(1) ~~~ theta ==  -.9)))

We then fit our model with the simulated MA(1) model. We set q equal to 1 by specifying the third number in our list:

set.seed(2.718)
ma.data <- arima.sim(list(order=c(0,0,1), ma=.9), n = 200)
mod2 <- auto.arima(ma.data)
mod2
## Series: ma.data 
## ARIMA(1,0,2) with zero mean 
## 
## Coefficients:
##           ar1    ma1     ma2
##       -0.8354  1.723  0.7496
## s.e.   0.2317  0.237  0.2105
## 
## sigma^2 estimated as 1.143:  log likelihood=-296.35
## AIC=600.71   AICc=600.91   BIC=613.9

Now we forecast 1 and 10 days into the future using our model:

plot(forecast(mod2, h=1)) # 1 day into the future

forecast(mod2, h=1)
##     Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## 201      0.3159447 -1.054301 1.686191 -1.779666 2.411555
plot(forecast(mod2, h=10))

We can again see point prediction and prediction intervals represented on our graph in blue.

Autoregressive Moving Average Model

We will now combine the previous two models into an autoregressive moving average model. The equation for this ARMA model is as follows:

\[{y_t} = {\phi_1}{y_{t-1}} + ... + {\phi_p}{y_{t-p}} + {\epsilon_t} + {\mu} + {\theta_1}{\epsilon_{t-1}} + ... + {\theta_q}{\epsilon_{t-q}}\]

Again, we create our simulated model and then plot our predictions:

set.seed(1234)
arma.data <- arima.sim(list(order=c(1,0,1), ma=.9, ar = .9), n = 200)
mod3 <- auto.arima(arma.data)
mod3
## Series: arma.data 
## ARIMA(4,0,4) with non-zero mean 
## 
## Coefficients:
##          ar1      ar2     ar3      ar4     ma1     ma2      ma3      ma4
##       1.1663  -0.5789  0.8841  -0.5949  0.7172  0.0696  -0.4784  -0.6420
## s.e.  0.3502   0.5705  0.3427   0.1475  0.3347  0.2136   0.2604   0.1482
##        mean
##       1.989
## s.e.  0.372
## 
## sigma^2 estimated as 0.9662:  log likelihood=-278.04
## AIC=576.09   AICc=577.25   BIC=609.07
plot(forecast(mod3, h=1)) 

forecast(mod3, h=1)
##     Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## 201       2.299482 1.039752 3.559213 0.3728907 4.226074
plot(forecast(mod3, h=10))

Including Differences

We have now made it to an ARIMA model, where the “I” stands for integrated, otherwise known as differences.

In this model, want zero mean and constant variance over time. If we don’t see this, take differences between consecutive measures until we do.

We start with d=1, and this will calculate first order differenes: y(2)-y(1), y(3)-y(2), etc. If we don’t see zero mean and constant variance, we next try d=2, which are second order differences. These are the differences between (y(2)-y(1)) and (y(3)-Y(2)), ((y(3)-y(2) and (y(4)-Y(3)) etc. We can test for zero mean and constant variance using adf.test() to do it by hand, otherwise auto.arima will perform this for us.

We can start with plotting a basic example:

plot(WWWusage)

And we now fit a model for this data using the auto.arima function again:

mod4 <- auto.arima(WWWusage)
mod4
## Series: WWWusage 
## ARIMA(1,1,1) 
## 
## Coefficients:
##          ar1     ma1
##       0.6504  0.5256
## s.e.  0.0842  0.0896
## 
## sigma^2 estimated as 9.995:  log likelihood=-254.15
## AIC=514.3   AICc=514.55   BIC=522.08

In the upper left corner of our output, we see ARIMA(1,1,1) this means the p = q = 1 and we used first order differences.

We use this model to forecast 20 days into the future:

mycast <- forecast(mod4,h=20)
plot(mycast)

That’s it!