Using ARIMA

This learning log will discuss how to create and forecast time series data sets using ARIMA. ARIMA is an acronym that refers to AutoRegressive Integrated Moving Average. We will first break down ARIMA into its components and discuss each portion before bringing the model back together at the end.

For the learning log, you’ll want to download the following packages and load up the following libraries:

install.packages(c(“quantmod”, “tseries”, “timeSeries”, “forecast”, “xts”))

library(quantmod) library(tseries) library(timeSeries) library(forecast) library(xts)

Autoregressive Models

Autoregressive models are used when we have reason to believe that the value of our time series at time t, \(y_t\) depends on previous values of the time series, namely \(y_{t-1}\), \(y_{t-2}\), …, \(y_{t-p}\). We can further say that \(y_t\) is a linear combination of these previous y values. Since our current time series value can be derived from our previous observations, autoregressive models allow use to forecast data point points using previous \(y_t\) values.

The equation of the AR(p) model is: \[ y_t = {\phi_1} y_{t-1} + {\phi_2} y_{t-2} + ... + {\phi_p} y_{t-p} + {\epsilon_t} .\]

Where p represents the pth order of the AR model, which tells us the number of previous data points that will be used in predict \(y_t\). Each \({\phi_p}\) is the coefficient used in the linear combination “weighting” of each previous value. We make the assumption that \({\phi_p}\neq0\) otherwise, this turn would not be in the model. We can think about our phi values as correlation figures between \(y_t\) and \(y_{t-p}\) although phi does not have bounds of -1 and 1.

Now that we’ve explained AR(p) models, let’s practice by using the Johnson and Johnson quarterly earnings data set.

library(astsa)
## Warning: package 'astsa' was built under R version 3.3.3
data(jj)

Quickly recall the raw data.

plot(jj, ylab="Quarterly Earnings", main="J&J Quarterly Earnings")

Now we will fit a model using the auto.arima command in the forecast library. All we need to specify is our data source.

library(forecast)
## Warning: package 'forecast' was built under R version 3.3.3
## 
## Attaching package: 'forecast'
## The following object is masked from 'package:astsa':
## 
##     gas
ModARIMA <- auto.arima(jj)
ModARIMA
## Series: jj 
## ARIMA(1,1,2)(0,1,0)[4] 
## 
## Coefficients:
##           ar1      ma1      ma2
##       -0.7921  -0.0970  -0.3945
## s.e.   0.1396   0.1802   0.1580
## 
## sigma^2 estimated as 0.1834:  log likelihood=-44.07
## AIC=96.14   AICc=96.68   BIC=105.61

By looking at the coefficients, we seen that the auto.arima selected a first order AR model with a \({\phi_1}\) equal to -0.7921. This we can then interpret this as \(y_t\) and \(y_{t-1}\) being negatively correlated.

Moving Average Models

Moving average models are used to predict future data points based on previous residuals. Further, our moving average model is a linear combination of our previous error terms, \({\epsilon_{t-1}}\), \({\epsilon_{t-2}}\), …, \({\epsilon_{t-q}}\), which our “weights” for these terms being thetas. Q is then the number of previous errors use in the prediction. We assume that \({\theta_{q}}\) does not equal zero, otherwise the term would be useless.

The qth order MA(q) model then becomes:

\[ y_t = u + {\epsilon_t} + {\theta_1}{\epsilon_{t-1}} + {\theta_2}{\epsilon_{t-2}} + ... + {\theta_q}{\epsilon_{t-q}}.\]

Here, u is the stationary value of our time series.

Now that we’ve explained MA(q) models, we will practice again with the Johnson and Johnson quarterly earnings. We will look at the same model we created before, specifically the ma coefficients.

ModARIMA <- auto.arima(jj)
ModARIMA
## Series: jj 
## ARIMA(1,1,2)(0,1,0)[4] 
## 
## Coefficients:
##           ar1      ma1      ma2
##       -0.7921  -0.0970  -0.3945
## s.e.   0.1396   0.1802   0.1580
## 
## sigma^2 estimated as 0.1834:  log likelihood=-44.07
## AIC=96.14   AICc=96.68   BIC=105.61

From the R output, we see that we would want to use a 2nd order moving average model (q=2) with coefficents \({\theta_{1}}\) equal to -0.0970 and \({\theta_{2}}\) equal to -0.3945.

Incorporating Integration

The I portion of the ARIMA acronym refers to Integration, or helping create a model with a mean value of zero and constant variance over time. Integration does this by changing our \(y_{t-p}\) to difference values. For example, if the difference order is one, the term for \(y_{t-1}\) would be the difference of \(y_{t-2}\) & \(y_{t-1}\). Similar differences can calculated as d increases.

Full ARIMA Model

Now that we’ve discusses Autoregression, Integration, and Moving Averages, we will put a full model together and forecast Johnson and Johnson’s Quarterly Earnings.

Equation & Notation

The notation for ARIMA models is ARIMA(p,d,q) and we still assume that \({\phi_p}\neq0\), \({\theta_{q}}\neq0\), and that our errors are IID and normally distributed with a mean of zero and a variance of sigma squared.

When we have a 0th d order integrated model, we can create the following equation for our model: \(y_t = {\phi_1}y_{t-1} + {\phi_2}y_{t-2} + ... + {\phi_p}y_{t-p} + u + {\epsilon_t} + {\theta_1}{\epsilon_{t-1}} + {\theta_2}{\epsilon_{t-2}} + ... + {\theta_q}{\epsilon_{t-q}}\) which is simply the AR and Ma models combined.

If d becomes nonzero, we have trouble creating a clear model equation, so we will omit that here.

Forecasting Using ARIMA

Using the forecast function in R, and specifying how many periods in the future to forecast using h=, we can forecast a time series. We will look at forecasting 1981-1991 (40 quarters).

JJForecast <- forecast(ModARIMA, h=40)
JJForecast
##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1981 Q1       17.78046 17.23158 18.32934 16.94102 18.61990
## 1981 Q2       16.25397 15.70173 16.80621 15.40939 17.09855
## 1981 Q3       17.60119 17.00261 18.19976 16.68575 18.51663
## 1981 Q4       13.19339 12.58713 13.79965 12.26620 14.12058
## 1982 Q1       19.36210 18.39633 20.32788 17.88507 20.83913
## 1982 Q2       17.83700 16.85472 18.81927 16.33474 19.33926
## 1982 Q3       19.18312 18.11629 20.24995 17.55155 20.81469
## 1982 Q4       14.77619 13.68516 15.86722 13.10761 16.44478
## 1983 Q1       20.94422 19.51313 22.37530 18.75556 23.13287
## 1983 Q2       19.41966 17.95316 20.88615 17.17685 21.66246
## 1983 Q3       20.76535 19.18698 22.34371 18.35145 23.17924
## 1983 Q4       16.35876 14.73645 17.98107 13.87765 18.83987
## 1984 Q1       22.52651 20.57250 24.48052 19.53811 25.51491
## 1984 Q2       21.00217 18.99237 23.01197 17.92844 24.07590
## 1984 Q3       22.34769 20.20461 24.49077 19.07013 25.62525
## 1984 Q4       17.94124 15.73427 20.14821 14.56597 21.31650
## 1985 Q1       24.10888 21.57360 26.64417 20.23150 27.98627
## 1985 Q2       22.58462 19.97374 25.19551 18.59162 26.57763
## 1985 Q3       23.93008 21.16793 26.69223 19.70573 28.15442
## 1985 Q4       19.52368 16.67870 22.36866 15.17266 23.87470
## 1986 Q1       25.69128 22.51843 28.86413 20.83883 30.54374
## 1986 Q2       24.16706 20.90007 27.43404 19.17064 29.16348
## 1986 Q3       25.51248 22.07846 28.94651 20.26060 30.76437
## 1986 Q4       21.10610 17.57163 24.64058 15.70059 26.51162
## 1987 Q1       27.27369 23.40991 31.13747 21.36455 33.18284
## 1987 Q2       25.74948 21.77441 29.72454 19.67014 31.82882
## 1987 Q3       27.09490 22.93858 31.25121 20.73836 33.45144
## 1987 Q4       22.68853 18.41542 26.96163 16.15337 29.22368
## 1988 Q1       28.85611 24.25102 33.46119 21.81324 35.89897
## 1988 Q2       27.33190 22.59967 32.06413 20.09457 34.56922
## 1988 Q3       28.67731 23.75079 33.60383 21.14285 36.21177
## 1988 Q4       24.27094 19.21247 29.32942 16.53468 32.00721
## 1989 Q1       30.43852 25.04456 35.83248 22.18917 38.68787
## 1989 Q2       28.91432 23.37847 34.45016 20.44798 37.38066
## 1989 Q3       30.25973 24.51750 36.00196 21.47774 39.04171
## 1989 Q4       25.85336 19.96502 31.74170 16.84793 34.85880
## 1990 Q1       32.02094 25.79303 38.24885 22.49618 41.54570
## 1990 Q2       30.49673 24.11318 36.88028 20.73394 40.25953
## 1990 Q3       31.84214 25.24086 38.44343 21.74636 41.93793
## 1990 Q4       27.43578 20.67512 34.19644 17.09624 37.77532
plot(JJForecast)

The R output provides both 80% and 95% confidence intervals for each forecast. The darker gray/blue is the 80% interval and the lighter gray is the 95% interval.