ARIMA Models

Introduction

Today we learned about the use of autoregressive integrated moving average models with time series data. We first explored the different models that make up the ARIMA model, like the autoregressive model, the moving average model, and the autoregressive moving average model. After understanding each of these models, we were able to combine them into one and explore the ARIMA model.

Autoregressive Model

With this model \(y_t\) will depend on \(y_{t-1}\) and maybe \(y_{t-2},...,y_{t-p}\) An autoregressive model of order p (AR(p)) has the equation: \[y_t=\phi_{1}y_{t-1}+\phi_{2}y_{t-2}+...+\phi_{p}y_{t-p}+\epsilon_t\] where \(y_t\) is stationary, \(\epsilon_t\) is i.i.d. Normal(0,\(\sigma^2\)), and \(\phi_1,...,\phi_p\) are constants and \(\phi_p\) does not equal zero. We will use data from several data sets. We use arima.sim command with a \(\phi\)=0.9 and -0.9. under the order parameter we only enter a 1 for the first value because that means that we have a first order autoregressive model AR(1). We can plot the two models next to each other for comparison.

library(quantmod)
library(tseries)
library(timeSeries)
library(forecast)
library(xts)


par(mfrow=c(2,1))
# phi = .9
plot(arima.sim(list(order=c(1,0,0),ar=.9) , n=200), ylab = "y", main = (expression(AR(1) ~~~ phi == +.9)))
abline(0,0)
# phi = -.9
plot(arima.sim(list(order=c(1,0,0),ar= -.9) , n=200), ylab = "y", main = (expression(AR(1) ~~~ phi == -.9)))
abline(0,0)

Here we see that the negative \(\phi\) value provides much more variation than the positive. This is due to the negative correlation created by our model. Now we will fit a model using simulated AR(1) data.

set.seed(2115)
fakedata <- arima.sim(list(order=c(1,0,0),ar=.9) , n=200)
plot(fakedata)

mod1 <- auto.arima(fakedata)
mod1  
## Series: fakedata 
## ARIMA(1,0,0) with zero mean 
## 
## Coefficients:
##          ar1
##       0.9266
## s.e.  0.0285
## 
## sigma^2 estimated as 1.004:  log likelihood=-284.66
## AIC=573.32   AICc=573.38   BIC=579.92

Here we see the random time series data from our simulation. Now we can take this data, and use it to forecast values in the future. For this example we will forecast 10 units of time into the future. We can easily see this forecast by plotting the data again, along with our future forecast.

# Now let's forecast using the model we just created
# 1 day into the future
mycast1 <- forecast(mod1, h=10) # includes prediction intervals
mycast1
##     Point Forecast     Lo 80      Hi 80     Lo 95       Hi 95
## 201      -6.557273 -7.841365 -5.2731806 -8.521123 -4.59342324
## 202      -6.075877 -7.826468 -4.3252857 -8.753175 -3.39857879
## 203      -5.629822 -7.698641 -3.5610024 -8.793808 -2.46583554
## 204      -5.216514 -7.523795 -2.9092321 -8.745196 -1.68783106
## 205      -4.833548 -7.327438 -2.3396583 -8.647623 -1.01947281
## 206      -4.478698 -7.122313 -1.8350818 -8.521759 -0.43563620
## 207      -4.149898 -6.915603 -1.3841929 -8.379679  0.07988295
## 208      -3.845237 -6.711618 -0.9788567 -8.228988  0.53851350
## 209      -3.562943 -6.513019 -0.6128660 -8.074695  0.94881016
## 210      -3.301372 -6.321457 -0.2812877 -7.920193  1.31744863
par(mfrow=c(1,1))
plot(mycast1) 

With the forecast command, R provides us with the 80% and 90% confidence interval for our forecast. We can see these two intervals shaded different colors, the 90% is the larger interval. The blue line depicts our best guess based on the intervals calculated.

Moving Average Models

Now we move to moving average where \(y_t\) is a linear combination of \(\epsilon_{t-1},...\epsilon_{t-q}\), this is called a moving average model of order q. The general equation is: \[y_t=\mu+\theta_1\epsilon_{t-1}+\theta_2\epsilon_{t-2}+...+\theta_q\epsilon_{t-q}\] where we assume \(\theta_q\) does not equal zero, and \(\epsilon_t\) is i.i.d. N(0,\(\sigma^2\)). Again we use the arima.sim command, but this time under the order parameter, we only put a value in the third spot, this is the moving average value. For our example we use the order one moving average model, MA(1). This time we have a \(\theta\) of 0.9 and -0.9.

par(mfrow = c(2,1))
plot(arima.sim(list(order=c(0,0,1), ma=.9), n = 200) , ylab = "y", main = (expression(MA(1) ~~~ theta ==  +.9)))
abline(0,0)
plot(arima.sim(list(order=c(0,0,1), ma=-.9), n = 200) , ylab = "y", main = (expression(MA(1) ~~~ theta ==  -.9)))
abline(0,0)

Again we see the negative theta has the greater variation. Now we will fit a model using simulated MA(1) data.

set.seed(2.718)
ma.data <- arima.sim(list(order=c(0,0,1), ma=.9), n = 200)
mod2 <- auto.arima(ma.data)
mod2
## Series: ma.data 
## ARIMA(1,0,2) with zero mean 
## 
## Coefficients:
##           ar1    ma1     ma2
##       -0.8354  1.723  0.7496
## s.e.   0.2317  0.237  0.2105
## 
## sigma^2 estimated as 1.143:  log likelihood=-296.35
## AIC=600.71   AICc=600.91   BIC=613.9

Now like before, we can use this model to create a forecast of future values. We will look at plots of both a 1 unit of time forecast, and a 10 units of time forecast. Like before we see the two shaded intervals, as well as the blue line that is the best guess based on the intervals. The R code output also provides us with the confidence interval values.

plot(forecast(mod2, h=1)) # 1 day into the future

forecast(mod2, h=1)
##     Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## 201      0.3159447 -1.054301 1.686191 -1.779666 2.411555
plot(forecast(mod2, h=10)) # 10 days into the future

Autoregressive Moving Average

Now there is one last step before we reach the ARIMA model, we will combine the two previous models. It is referred to as the autoregressive moving average model. We combine the two previous equations to get: \[y_t=\phi_1y_{t-1}+...+\phi_py_{t-p}+\epsilon_t+\mu+\theta_1\epsilon_{t-1}+...+\theta_q\epsilon_{t-q}\] and the assumptions are simply the combination of the previous two models. We can again create a simulation and under the order parameter we combine the autoregressive order 1 and the moving average order 1.

set.seed(1234)
arma.data <- arima.sim(list(order=c(1,0,1), ma=.9, ar = .9), n = 200)
mod3 <- auto.arima(arma.data)
mod3
## Series: arma.data 
## ARIMA(4,0,4) with non-zero mean 
## 
## Coefficients:
##          ar1      ar2     ar3      ar4     ma1     ma2      ma3      ma4
##       1.1663  -0.5789  0.8841  -0.5949  0.7172  0.0696  -0.4784  -0.6420
## s.e.  0.3502   0.5705  0.3427   0.1475  0.3347  0.2136   0.2604   0.1482
##        mean
##       1.989
## s.e.  0.372
## 
## sigma^2 estimated as 0.9662:  log likelihood=-278.04
## AIC=576.09   AICc=577.25   BIC=609.07

Here we see that the output suggests the we move to an order 4 model. It also provides us with the coefficient values, and the standard errors. Now we can use this model to forecast.

plot(forecast(mod3, h=10)) # 1 day into the future

forecast(mod3, h=1)
##     Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## 201       2.299482 1.039752 3.559213 0.3728907 4.226074

We see that the ten day forecast predicts that the values will continue to decrease to a trough, then slightly rise again. We can also look at the R output to see what the exact intervals are. The plot is just a simplistic way to see all of the intervals combined.

ARIMA Model

Now we can combine all we’ve learned so far into one model. We now would use the middle term in the order parameter, which is called the difference. The combination of all three of these values makes up the ARIMA model. We will now create an ARIMA model based on some time series data within R.

data("AirPassengers")
mod7 <- auto.arima(AirPassengers)
mycast7 <- forecast(mod7,h=5)
mod7
## Series: AirPassengers 
## ARIMA(2,1,1)(0,1,0)[12] 
## 
## Coefficients:
##          ar1     ar2      ma1
##       0.5960  0.2143  -0.9819
## s.e.  0.0888  0.0880   0.0292
## 
## sigma^2 estimated as 132.3:  log likelihood=-504.92
## AIC=1017.85   AICc=1018.17   BIC=1029.35
plot(mycast7)

From the AirPassengers data we see that the best ARIMA model is created with an order 2 autoregressive, difference of 1, and and order 1 moving average. We also see the coefficients. The ten unit forecast appears to continue on with the seasonal increasing trend. This model combines the best of all the previous models the create the best.