ARIMA Models

Introduction

In the previous lecture we learned how a regression model can capture trend and seasonality
In this chapter we are going to learn how a regression model can be used to quantify the correlation between neighboring values in a time series (called autocorrelation)
This type of model, called an autoregressive (AR) model, is useful for improving forecast accuracy by making use of the information contained in the autocorrelation (beyond trend and seasonality)
If we know that values tend to be followed by similar values (positive autocorrelation) or by very different values (negative autocorrelation), then we can use that information to adjust forecasts

Stationarity and differencing

A stationary time series is one whose properties do not depend on the time at which the series is observed
Thus, time series with trends, or with seasonality, are not stationary — the trend and seasonality will affect the value of the time series at different times.
On the other hand, a white noise series is stationary — it does not matter when you observe it, it should look much the same at any point in time
In general, a stationary time series will have no predictable patterns in the long-term
Time plots will show the series to be roughly horizontal (although some cyclic behavior is possible), with constant variance
More formally, we say that a time series is stationary if the multivariate cumulative distribution function of the time series does not change overtime
This is what we call the strong stationarity.
A more simpler and comprehensive definition of the stationarity is what we call the weak stationarity.
A time series is said weakly stationary if
- its mean is constant
- its standard deviation is constant
- its cross covariance does not depend on time but depends only on the lag between the two series concerned

y <-  rnorm(1000, 0, 1)
plot.ts(y, main="A stationary time series")

x <- 1:1000
y <- rnorm(1000, 0, 1*x)
z <- 2*x + y
plot.ts(z, main="A non-stationary series")

Which of these series is stationary?

One simple way to make a non-stationary time series stationary is to compute the differences between consecutive observations (known as differencing)
Differencing can help stabilise the mean of a time series by removing changes in the level of a time series, and therefore eliminating (or reducing) trend and seasonality, \(y_t^{'}=y_t - y_{t-1}\)
Occasionally the differenced data will not appear to be stationary and it may be necessary to difference the data a second time to obtain a stationary series \[\begin{align} y_t^{''} &= y_t^{'} - y_{t-1}^{'} \notag \\ &= (y_t - y_{t-1}) - (y_{t-1} - y_{t-2}) \notag \\ &= y_t -2y_{t-1} + y_{t-2} \notag \end{align}\]
Transformations such as logarithms can also help to stabilize the variance of a time series
A seasonal difference is the difference between an observation and the previous observation from the same season, \(y_t^{'} = y_t - y_{t-m}\), where \(m =\) number of seasons
These are also called lag-m differences, as we subtract the observation after a lag of \(m\) periods
Sometimes it is necessary to take both a seasonal difference and a first difference to obtain stationary data

cbind("Billion kWh" = usmelec,
      "Logs" = log(usmelec),
      "Seasonally\n differenced logs" =
        diff(log(usmelec),12),
      "Doubly\n differenced logs" =
        diff(diff(log(usmelec),12),1)) %>%
  autoplot(facets=TRUE) +
  xlab("Year") + ylab("") +
  ggtitle("Monthly US net electricity generation")

If a time series can be made stationary by differencing, it is said to contain a unit root
One way to determine more objectively whether differencing is required is to use a unit root test
These are statistical hypothesis tests of stationarity that are designed for determining whether differencing is required
A number of unit root tests are available, which are based on different assumptions and may lead to conflicting answers
- Augmented Dickey-Fuller (ADF) test (Said and Dickey, 1984)
  - \(H_0:\) The series is non-stationary. (Data contains a unit root)
  - \(H_1:\) The series is stationary. (Data does not contain a unit root)
- Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test (Kwiatkowski, Phillips, Schmidt, & Shin, 1992)
  - \(H_0:\) The series is stationary. (Data does not contain a unit root)
  - \(H_1:\) The series is non-stationary. (Data contains a unit root)
- Elliott-Rothenberg-Stock (ERS) test (Elliott, G., Rothenberg, T.J. and Stock, J.H.,1996)

An example of the KPSS test is given below.

library(urca)
library(fpp2)
goog %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 7 lags. 
## 
## Value of test-statistic is: 10.7223 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The test statistic is greater than the 1% critical value, indicating that the null hypothesis is rejected, that is, the data are not stationary
We apply differencing and re-run KPSS test. The results are shown below.

#performs KPSS test on the differenced series
goog %>% diff() %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 7 lags. 
## 
## Value of test-statistic is: 0.0324 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

Meanwhile, the result of the Augmented Dickey-Fuller test is shown below

library(tseries)
goog %>% adf.test()

## 
##  Augmented Dickey-Fuller Test
## 
## data:  .
## Dickey-Fuller = -2.5417, Lag order = 9, p-value = 0.349
## alternative hypothesis: stationary

A similar function for determining whether seasonal differencing is required is nsdiffs(),

ndiffs(goog)

## [1] 1

No seasonal differences are suggested if \(F_S < 0.64\), otherwise one seasonal difference is suggested
We can apply nsdiffs() to the logged US monthly electricity data

#Determines the number of seasonal differencing
usmelec %>% log() %>% nsdiffs()

## [1] 1

#Determines the number of differencing 
#on the seasonally differenced series
usmelec %>% log() %>% diff(lag=12) %>% ndiffs()

## [1] 1

These functions suggest we should do both a seasonal difference and a first difference

Autocorrelation

In time series data, values in neighboring periods tend to be correlated
Such correlation, called autocorrelation, is informative and can help in improving forecasts
If values tend to be followed by similar values, then we have positive autocorrelation
If values tend to be followed by very different values, then we have negative autocorrelation
We use that information to adjust forecasts
We will now discuss how to compute the autocorrelation of a series and how best to utilize the information for improving forecasting
To compute autocorrelation, we compute the correlation between the series and a lagged version of the series
This is illustrated in the following code chunk.

amtrak <- read.csv("Amtrak.csv")
#Time series containing the 1st 24 months
amtrak.ts <- ts(amtrak$Ridership,
                start=c(1991,1), 
                end=c(1992,12),
                freq=12)
#Generates lag 1 series using dplyr package
lag1 <- dplyr::lag(as.data.frame(amtrak.ts),1)

#Creates a data frame
cordat <- cbind(as.data.frame(amtrak.ts),lag1)

#Computes the Pearson correlation
cor(cordat,use="complete.obs")

##            x          x
## x 1.00000000 0.08088924
## x 0.08088924 1.00000000

We can use the forecast package’s Acf() function to compute and plot the autocorrelations of a series at different lags

Acf(amtrak.ts,lag.max=12, main="")

A few typical autocorrelation behaviors that are useful to explore are:
- Strong autocorrelation (positive or negative) at multiples of a lag larger than 1 typically reflects a cyclical pattern. For example, strong positive autocorrelation at lags 12, 24, 36,. . . in monthly data will reflect an annual seasonality (where values during a given month each year are positively correlated).
- Positive lag-1 autocorrelation (called “stickiness”) describes a series where consecutive values move generally in the same direction. In the presence of a strong linear trend, we would expect to see a strong positive lag-1 autocorrelation.
- Negative lag-1 autocorrelation reflects swings in the series, where high values are immediately followed by low values and vice versa.

Autoregressive (AR) Models

AR models are similar to linear regression models, except that the predictors are the past values of the series
An an autoregressive model of order p can be written as

\[ y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \cdots + \phi_p y_{t-p} + \epsilon_t \]

where \(\epsilon_t\) is white noise.

This is like a multiple regression but with lagged values of \(y_t\) as predictors
We refer to this as an \(AR(p)\) model, an autoregressive model of order \(p\)
Autoregressive models are remarkably flexible at handling a wide range of different time series patterns
Below are examples of AR models:

The corresponding equations of the above AR models are
- AR(1): \(y_t =18 - 0.8 y_{t-1} + \epsilon_t\)
- AR(2): \(y_t = 8 + 1.3 y_{t-1} - 0.7y_{t-2} + \epsilon_t\)
- In both equations \(\epsilon_t \sim N(0, 1)\)
Changing the parameters \(\phi_1, \phi_2, \cdots, \phi_p\) results in different time series patterns
The variance of the error term \(\epsilon_t\) will only change the scale of the series, not the patterns
For an AR(1) model:
- when \(\phi_1 = 0\), \(y_t\) is equivalent to a white noise
- when \(\phi_1 = 1\) and \(c = 0\), \(y_t\) is equivalent to a random walk
- when \(\phi_1 = 1\) and \(c \neq 0\), \(y_t\) is equivalent to a random walk with drift
- when \(\phi_1 < 0\), \(y_t\) tends to oscillate around the mean
We normally restrict autoregressive models to stationary data, in which case some constraints on the values of the parameters are required
- For an AR(1) model: \(-1 < \phi_1 < 1\)
- For an AR(2) model: \(-1 < \phi_1 < 1, \phi_1 + \phi_2 < 1, \phi_2 - \phi_1 < 1\)
When \(p \geq 3\), the restrictions are much more complicated. R takes care of these restrictions when estimating a model.

Moving Average (MA) Models

Rather than using past values of the forecast variable in a regression, a moving average model uses past forecast errors (\(\epsilon_t\)) in a regression-like model. \[ y_t = c + \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \cdots + \theta_q \epsilon_{t-q} \]
where \(\epsilon_t\) is white noise.
We refer to this as an \(MA(q)\) model, a moving average model of order \(q\).
Note that we do not actually observe the values of \(\epsilon_t\), so it is not really a regression in the usual sense.
Notice that each value of \(y_t\) can be thought of as a weighted moving average of the past few forecast errors
Below are examples of MA models

The corresponding equations of the above MA models are
- MA(1): \(y_t = 20 + \epsilon_t + 0.8 \epsilon_{t-1}\)
- MA(2): \(y_t = \epsilon_t - \epsilon_{t-1} + 0.8 \epsilon_{t-2}\)
- In both cases \(\epsilon_t \sim N(0, 1)\)
Stationarity constraints of the MA model:
- MA(1) model: \(-1 < \theta_1 < 1\)
- MA(2) model: \(-1 < \theta_2 < 1;\, \theta_1 + \theta_2 > -1;\, \theta_1 - \theta_2 < 1\)
More complicated conditions hold for \(q \geq 3\). Again, R will take care of these constraints when estimating the models

(Non-seasonal) ARIMA Models

If we combine differencing with autoregression and a moving average model, we obtain a (non-seasonal) ARIMA model
ARIMA is an acronym for AutoRegressive Integrated Moving Average (in this context, “integration” is the reverse of differencing)
The full model can be written as \[ y_t^{'} = c + \phi_1 y_{t-1}^{'} + \cdots + \phi_p y_{t-p}^{'} + \theta_1 \epsilon_{t-1} + \cdots + \theta_q \epsilon_{t-q} + \epsilon_t \] where:
- \(y_t^{'}\) is the differenced series which may have been differenced more than once
- The “predictors” on the right hand side include both lagged values of \(y_t\) and lagged errors, \(\epsilon_t\)
We call this an \(ARIMA(p,d,q)\) model, where: \(p\) = order of the autoregressive part; \(d\) = degree of first differencing involved; and \(q\) = order of the moving average part
Many of the models we have already discussed are special cases of the ARIMA model
- ARIMA(0, 0, 0): White noise
- ARIMA(0, 1, 0) with no constant: Random walk
- ARIMA(0, 1, 0) with a constant: Random walk with a drift
- ARIMA(p, 0, 0): Autoregression
- ARIMA(0, 0, q): Moving average
Selecting appropriate values for \(p, d, \text{and}\; q\) can be difficult
The auto.arima() function in R will do it automatically for us
The use of the auto.arima() function is illustrated in the following code chunks

autoplot(uschange[,"Consumption"]) +
  xlab("Year") + ylab("Quarterly percentage change")

fit <- auto.arima(uschange[,"Consumption"], seasonal=FALSE)
fit

## Series: uschange[, "Consumption"] 
## ARIMA(1,0,3) with non-zero mean 
## 
## Coefficients:
##          ar1      ma1     ma2     ma3    mean
##       0.5885  -0.3528  0.0846  0.1739  0.7454
## s.e.  0.1541   0.1658  0.0818  0.0843  0.0930
## 
## sigma^2 = 0.3499:  log likelihood = -164.81
## AIC=341.61   AICc=342.08   BIC=361

The model obtained using auto.arima() is ARIMA(1, 0, 3), hence the equation is

\[ y_t = c + 0.589 y_{t-1} - 0.353 \epsilon_{t-1} + 0.085 \epsilon_{t-2} + 0.174 \epsilon_{t-3} + \epsilon_t \]

Here \(c = 0.745 \times (1 − 0.589) = 0.307\) and \(\epsilon_t\) is white noise with standard deviation of \(\sqrt{0.3499} \approx 0.592\)
Forecasts from the model are obtained and plotted using the following code chunk:

fit |> 
  forecast(h=10) |>
  autoplot(include=80)

Some Remarks

The auto.arima() function is useful, but anything automated can be a little dangerous, and it is worth understanding something of the behavior of the models even when you rely on an automatic procedure to choose the model for you.
The constant \(c\) has an important effect on the long-term forecasts obtained from these models
- If \(c=0\) and \(d=0\), the long-term forecasts will go to zero
- If \(c=0\) and \(d=1\), the long-term forecasts will go to a non-zero constant
- If \(c=0\) and \(d=2\), the long-term forecasts will follow a straight line
- If \(c \neq 0\) and \(d=0\), the long-term forecasts will go to the mean of the data
- If \(c \neq 0\) and \(d=1\), the long-term forecasts will follow a straight line
- If \(c \neq 0\) and \(d=2\), the long-term forecasts will follow a quadratic trend
The value of \(d\) also has an effect on the prediction intervals — the higher the value of \(d\), the more rapidly the prediction intervals increase in size
For \(d=0\), the long-term forecast standard deviation will go to the standard deviation of the historical data, so the prediction intervals will all be essentially the same
The value of \(p\) is important if the data show cycles
To obtain cyclic forecasts, it is necessary to have \(p \geq 2\), along with some additional conditions on the parameters.
For an AR(2) model, cyclic behavior occurs if \(\phi_1^2 + 4 \phi_2 < 1\). In that case, the average period of the cycles is \[ \frac{2 \pi}{arccos(-\phi_1(1+\phi_2)/(4\phi_2))} \]

Determining the order of ARIMA using the ACF and PACF plots

Selecting appropriate values for \(p\), \(d\), and \(q\) can be difficult
The autocorrelation function (ACF) plot is a bar chart of coefficients of correlation between a time series and its lagged values
ACF plot shows the autocorrelations between \(y_t\) and \(y_{t-k}\) for different values of \(k\)
The partial autocorrelation function (PACF) plot shows the relationship between \(y_t\) and \(y_{t-k}\) after removing the effects of lags \(1,2,\cdots,k-1\)
If the time series is from an \(ARIMA(p,d,0)\) or \(ARIMA(0,d,q)\), then the ACF and PACF plots can be helpful in determining the values of \(p\) and \(q\)
The data may follow an \(ARIMA(p,d,0)\) model if the ACF and PACF plots of the differenced data show the following patterns:
- the ACF is exponentially decaying or sinusoidal
- there is a significant spike at lag \(p\) in the PACF, but none beyond lag \(p\)
The data may follow an \(ARIMA(0,d,q)\) model if the ACF and PACF plots of the differenced data show the following patterns:
- the PACF is exponentially decaying or sinusoidal
- there is a significant spike at lag \(q\) in the ACF, but none beyond lag \(q\)

AR(1) Process

AR(2) Process

MA(1) Process

MA(2) Process

Estimation of parameters

Once the model order has been identified, that is, the values of \(p\), \(d\), and \(q\) have been identified, we need to estimate the parameters \(c, \phi_1, \cdots, \phi_p, \theta_1, \cdots, \theta_q\)
Maximum likelihood estimation (MLE) finds the values of the parameters which maximize the probability of obtaining the data
The values of \(p\), \(d\), and \(q\) obtained based on the ACF and PACF serves as initial order of the ARIMA model
We can experiment with different models in the neighborhood of the initial model
Several statistical measures can be used to select the “best” fitting ARIMA model: AIC, BIC, etc

Model selection statistics

The Akaike’s Information Criterion (AIC) can be used to compare competing ARIMA models \[ AIC = -2log(L) + 2(p + q + k + 1) \] where: \(L\) is the likelihood; \(k=1\) if \(c \neq 0\) and \(k=0\) if \(c=0\)
To adjust for overfitting (too many parameters) when the sample size is small, we use the corrected AIC (AICc) \[ AICc = AIC + \frac{2(p+q+k+1)(p+q+k+2)}{T-p-q-k-2} \]
The Bayesian Information Criterion (BIC) can be written as \[ BIC = AIC + [log(T)-2](p+q+k+1) \]
Good models have minimum AIC, AICc or BIC

ARIMA model building

General process for forecasting using an ARIMA model

An example

library(fpp2)
library(ggplot2)
library(dplyr)

elecequip %>% stl(s.window='periodic') %>% seasadj() -> eeadj
autoplot(eeadj)

eeadj %>% diff() %>% ggtsdisplay(main="")

fit1 <- Arima(eeadj, order=c(3,1,1))
summary(fit1)

## Series: eeadj 
## ARIMA(3,1,1) 
## 
## Coefficients:
##          ar1     ar2     ar3      ma1
##       0.0044  0.0916  0.3698  -0.3921
## s.e.  0.2201  0.0984  0.0669   0.2426
## 
## sigma^2 = 9.577:  log likelihood = -492.69
## AIC=995.38   AICc=995.7   BIC=1011.72
## 
## Training set error measures:
##                      ME     RMSE      MAE         MPE     MAPE    MASE
## Training set 0.03288179 3.054718 2.357169 -0.00647009 2.481603 0.28844
##                     ACF1
## Training set 0.008980578

checkresiduals(fit1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(3,1,1)
## Q* = 24.034, df = 20, p-value = 0.2409
## 
## Model df: 4.   Total lags used: 24

autoplot(forecast(fit1))