Learning ARIMAs

Today in class we were talking about ARIMA models. We broke up the ARIMA into different components to learn about them, and then combined them all together to forecast. ARIMA stands for autoregressive integrated moving average. We will break them up and then do some forecasting with them all together.

Autoregressive

The autoregressive has Yt as a linear combination of Yt-1, . . . , Yt-p. Autoregressive can be notated as AR(p). The p is the order of the model. We can write out an example of AR(1). Yt = \(\phi\)1 * Yt-1 + \(\epsilon\)t. There are some conditions that have to be followed:

  1. Yt is stationary
  2. \(\epsilon\)t is iid ~ N(0, \(\sigma\)2)
  3. \(\phi\)1, . . . , \(\phi\)p are constants, with \(\phi\)p \(\neq\) 0
  4. E(Yt) = 0. If not, sub in Yt for Yt - E(Yt)

Moving Average Models

Here, Yt is a linear combination of \(\epsilon\)t-1, . . . , \(\epsilon\)t-q. THe moving average uses q as the order. It can be abbreviated at MA(q). Showing an example, we can practice on MA(1). Yt = \(\mu\) + \(\epsilon\)t + \(\theta\)1 * \(\epsilon\)t-1.

Autoregressive Moving Average Models

Here, we combine the last two sections. This can be abbreviated at ARMA(p, q). Again, we will do an example of each to the first order; ARMA(1,1). Yt = \(\phi\)1 * Yt-1 + \(\epsilon\)t + \(\mu\) + \(\epsilon\)t + \(\theta\)1 * \(\epsilon\)t-1.

Integrated

This is also known as “differencing.” When forecasting, we want to have constant mean and constant variance throughout our model. If we don’t we can use difference to change that. Luckily, even though we can do this by hand, there is no need to. We will talk more about this when combining all of the factors. Differencing has the componendt “d” in the ARIMA model.

ARIMA Put Together

We can combine all of the previous sections and have our final ARIMA. The abbreviation for it is ARIMA(p, d, q). Luckily, we don’t have to do all of this by hand, we can use some R functions to find it for us. It will automatically see if there needs to be differencing, how many error terms, etc. The function is called ‘auto.arima’. We can practice with an actual dataset to forecast into the future.

The Example

We can look at the data for global temperature. Like always, we have to call the library and say which data set we want to use.

library(astsa)
## Warning: package 'astsa' was built under R version 3.4.3
library(forecast)
## Warning: package 'forecast' was built under R version 3.4.4
## 
## Attaching package: 'forecast'
## The following object is masked from 'package:astsa':
## 
##     gas
data("globtemp")

Next, we want R to create an arima for use, so we use the ‘auto.arima’ model, and telling R which dataset we want it to run an arima for.

model <- auto.arima(globtemp)
model
## Series: globtemp 
## ARIMA(1,1,1) with drift 
## 
## Coefficients:
##          ar1      ma1   drift
##       0.3549  -0.7663  0.0072
## s.e.  0.1314   0.0874  0.0032
## 
## sigma^2 estimated as 0.01011:  log likelihood=119.88
## AIC=-231.76   AICc=-231.46   BIC=-220.14

R tells us to use a p of 1, d of 1 and q of 1. We can use the function ‘forecast’ and ‘plot’ to see how the data points look in the future. The “h” in forecast tells how many units in the future we want to predict.

We can use the coefficients model to see our actual equation to use for the ARIMA. Yt = 0.3549Yt-1 + \(\epsilon\)t - 0.7663\(\epsilon\)t-1 + \(\mu\).

myForecastModel <- forecast(model, h = 10)
myForecastModel
##      Point Forecast     Lo 80     Hi 80     Lo 95    Hi 95
## 2016      0.8031838 0.6743294 0.9320383 0.6061180 1.000250
## 2017      0.7841177 0.6346035 0.9336320 0.5554555 1.012780
## 2018      0.7819961 0.6219774 0.9420147 0.5372687 1.026723
## 2019      0.7858872 0.6181356 0.9536388 0.5293333 1.042441
## 2020      0.7919120 0.6174348 0.9663893 0.5250721 1.058752
## 2021      0.7986940 0.6179620 0.9794260 0.5222882 1.075100
## 2022      0.8057447 0.6190423 0.9924470 0.5202080 1.091281
## 2023      0.8128907 0.6204288 1.0053525 0.5185456 1.107236
## 2024      0.8200705 0.6220253 1.0181156 0.5171866 1.122954
## 2025      0.8272623 0.6237901 1.0307345 0.5160785 1.138446
plot(myForecastModel)

If we want to see the forecasting in more detail, we can just run “myForecastModel.” That will tell us the point prediction, the 80% confidence interval, and the 95% confidence interval. On the plot, the bright blue line shows the point prediction, the darker blue shading shows the 80% CI, and the lighter blue shows the 95% CI. We can note that the CI’s are very wide–it is very hard to forecast so we aren’t very confident on a narrow result.