ACF

Here I will try to explain more of the concept behind autocorrelation function (ACF) and partial autocorrelation function (PACF). We will first begin with the ACF. The formula for the ACF can be derived from the covariance. Recall that for stationary series, we require the following, \[E(X_t)=\mu_t \text{ for all t}\] \[Cov(X_t, X_{t+j}) \text{ is the same for all t (for all nonnegative j)}\]

To better underestand the first equation regarding the expected value of \(X_t\), this assumption can be satisfied by simply taking the expected value of a series. For example, given the \(MA(1)\) series \(X_t = \theta \varepsilon_{t-1} + \varepsilon_t\), we can do the following, \[E(X_t) = E(\theta \varepsilon_{t-1} + \varepsilon_t)\] \[=E(\theta \varepsilon_{t-1}) + E(\varepsilon_t)\] \[=\theta E(\varepsilon_{t-1}) + 0\] \[=0 + 0 = 0.\]

In the above we see that the mean is the same for all \(t\), because the result does not depend on \(t\). To understand how the result was derived, understand that white noise is independently distributed with mean 0. So, the expected value of white noise at any time is always 0. It is important to note that not all series have the same \(\mu\) for all \(t\).

We will next look towards how the ACF plot is created. Using the same equation as above, let us look at the covariance. \[Cov(X_t, X_{t+j}) = Cov(\theta \varepsilon_{t-1} + \varepsilon_t, \theta \varepsilon_{t+j-1} + \varepsilon_{t+j})\] \[= Cov(\theta \varepsilon_{t-1}, \theta \varepsilon_{t+j-1}) + Cov(\theta \varepsilon_{t-1}, \varepsilon_{t+j}) + Cov(\varepsilon_t, \theta \varepsilon_{t+j-1}) + Cov(\varepsilon_t, \varepsilon_{t+j})\] \[= \theta^2Cov(\varepsilon_{t-1}, \varepsilon_{t+j-1}) + \theta Cov(\varepsilon_{t-1}, \varepsilon_{t+j}) + \theta Cov(\varepsilon_t, \varepsilon_{t+j-1}) + Cov(\varepsilon_t, \varepsilon_{t+j})\] \[= \theta^2Cov(\varepsilon_{t-1}, \varepsilon_{t+j-1}) + \theta Cov(\varepsilon_{t-1}, \varepsilon_{t+j}) + \theta Cov(\varepsilon_t, \varepsilon_{t+j-1}) + Cov(\varepsilon_t, \varepsilon_{t+j})\] If we recall that white noise is independently distributed, then it follows that the covariance of independent terms is 0. So the above result we must look carefully for each value of \(j=0,1,\cdots\). Luckily, the pattern is obvious so we can stop calculating after a few steps. Let us first look at the case for \(j=0\).

\[= \theta^2Cov(\varepsilon_{t-1}, \varepsilon_{t+0-1}) + \theta Cov(\varepsilon_{t-1}, \varepsilon_{t+0}) + \theta Cov(\varepsilon_t, \varepsilon_{t+0-1}) + Cov(\varepsilon_t, \varepsilon_{t+0})\] Now following the fact that \(Var(X_t) = Cov(X_t, X_t)\), \[= \theta^2\sigma^2 + 0 + 0 + \sigma^2\] \[= (\theta^2 + 1) \sigma^2\] Since \(Var(\varepsilon_t)=\sigma^2\). Then for \(j=1\), the result is \(\theta \sigma^2\) and for \(j\geq2\) the result is 0. What happens at \(j=2\) is that none of the indices on the left hand side or right hand side of any of the covariances match and so the result is that all the white noise terms are independent meaning their covariances are 0. We can express this result as follows, \[ \gamma(j) = Cov(X_t, X_{t+j}) = \begin{cases} (1 + \theta^2) \sigma^2 & j=0 \\ \theta\sigma^2 & j=1 \\ 0 & j\geq2 \end{cases} \] Also, it is worthwhile to note that \(\gamma(-j) = \gamma(j)\) and so we only need to count from 0 on upwards. These steps which show the autocovariance \(\gamma(j)\) are part of the formula for the autocorrelation function \(\rho(j)\), where \(\rho(j) = \gamma(j)/\gamma(0)\).

If you notice that the ACF for the \(MA(1)\) process dropped off to 0 right after \(j=1\). This makes sense since \(\rho(2) = \gamma(2)/\gamma(0) = 0/((1+\theta^2)\sigma^2)=0\).

Below I create an ACF of the theoretical values for the given \(MA(1)\), where \(\theta=0.6\). It is evident that the values drop to 0 after lag 1. Also, it is clear that at lag 0 the ACF is always 1. This is because \(\rho(0) = \gamma(0)/\gamma(0) = 1\).

jmax <- 5
lags <- 0:jmax
rhos <- ARMAacf(ma = c(0.6), lag.max = jmax)
plot(lags, rhos, pch = 19, xlab = "Lag", ylab = "ACF", ylim = c(-0.01, 1), main = 'Autocorrelation Function at theta = 0.6')
segments(x0 = lags, y0 = 0, x1 = lags, y1 = rhos) 
abline(h = 0)

PACF

The above process is quite straightforward for an \(MA(q)\) process, but it is more complicated for an \(AR(p)\) process. In fact, given what you’ve seen above, it now makes sense that any \(MA(q)\) process is also stationary. This does not apply to \(AR(p)\) processes, because the way that the formula works for autoregressive series does not allow for this simple breakdown. Instead, the PACF utilizes a much longer eqation, but it is not necessarily too difficult either.

The only simple \(AR(p)\) to work with is \(AR(1)\), but we won’t go into it here. For other \(AR(p)\) processes, there is no simple formula to understand all of the PACF’s. There is in fact a method called Yule-Walker, but it was not emphasized in my class so I won’t go too deep into it here either.

The interpretation however, for ACF and PACF plots are the same. We see that if the tail cuts off after lag \(j\), then we can say that it is an \(MA(q)\) or \(AR(p)\) where either \(q\) or \(p\) equal \(j\). It is only in the theoretical examples that we see such clean plots. When using real or even simulated data, the results require closer examination.

NOTE: The PACF plots start at lag 1.

Simulation

Below we will show some quick simulations to get an idea of what actual ACF and PACF plots may appear like. The first simulation is for an \(MA(2)\) process.

set.seed(101)
MA2 <- arima.sim(n = 100, list(ma = c(0.9, 0.6)), sd = 1)
acf(MA2, main = 'MA(2) theta1=0.9, theta2=0.6 ACF')

pacf(MA2, main = 'MA(2) theta1=0.9, theta2=0.6 PACF')

Next we will have an \(AR(2)\) process.

set.seed(101)
AR2 <- arima.sim(n = 100, model = list(ar = c(0.9, -0.6)), sd = 1)
acf(AR2, main = 'AR(2) phi1=0.9, phi2=-0.6 ACF')

pacf(AR2, main = 'AR(2) phi1=0.9, phi2=-0.6 PACF')

It is clear that when you examine the \(MA(2)\) through an ACF and an \(AR(2)\) through a PACF that the cutoff happens after lag 2 for each of those corresponding plots. In the other plots however you see a slow ‘tailing-off’ which is typical. If you encounter a rough where the ACF and PACF both are tailing off, it indicates that there is an underlying \(ARMA(p,q)\) model.

However, with real-world data we can’t be certain of the actual model. We can merely guess at it by using the available sample data. For each model of the rough that we fit with an \(ARMA(p,q)\) where \(p=0,\cdots\) and \(q=\cdots\), there is a corresponding criterion value such as \(AICc\). We can then fit several possible models based on the appearance of the ACF and PACF plots to see which model has the smallest \(AICc\) value. It is also possible to use the function auto.arima() from the forecast package to do this task. However, their results are not identical and so it is up to you what you choose. There could be differences in how they choose their criterion, but I am not sure as to the reason.