Time series are a little different from other types of data. Time series data often has long-term trends or periodic patterns that traditional summary statistics don’t capture. To find these patterns, you need to use different types of analyses.

1 Stationarity

The foundation of time series analysis is stationarity.

1.1 strictly stationary

A time series \({r_t}\) is said to be strictly stationary if the joint distribution of \((r_{t_1}, \cdots, r_{t_k})\) is identical to that of a collection of k positive integers. In other words, strict stationarity requires that the \((r_{t_{1}+t} ,... ,r_{t_k+t} )\) for all t,where k is an arbitrary positive integer and \((t_1,... ,t_k)\) is joint distribution of \((r_{t_1}, \cdots, r_{t_k})\) is invariant under time shift. This is a very strong condition that is hard to verify empirically.

1.2 weakly stationary

A weaker version of stationarity is often assumed. A time series \({r_t}\) is weakly stationary if both the mean of \(r_t\) and the More specifically, \({r_t}\) is weakly stationary if

  1. \(E(r_t) = \mu\), which is a constant, covariance between \(r_t\) and \(r_{t-\ell}\) are time invariant, where \(\ell\) is an arbitrary integer. the time plot of the data would show that the T values fluctuate with constant have observed T data points \(\{r_t |t = 1,... ,T\}\).

  2. \(Cov(r_{t} ,r_{t-\ell}) = \gamma_\ell\), which only depends on \(\ell\). In practice, suppose that we variation around a fixed level. In applications, weak stationarity enables one to make inference concerning future observations (e.g., prediction).

1.3 Some examples

1.3.1 Example 1 Johnson & Johnson Quarterly Earnings

The following figure shows quarterly earnings per share for the U.S. company Johnson & Johnson, furnished by Professor Paul Griffin (personal communication) of the Graduate School of Management, University of California, Davis.

load("tsa3.rda")
plot(jj, type="o", ylab="Quarterly Earnings per Share")

There are 84 quarters (21 years) measured from the first quarter of 1960 to the last quarter of 1980. Modeling such series begins by observing the primary patterns in the time history. In this case, note the gradually increasing un- derlying trend and the rather regular variation superimposed on the trend that seems to repeat over quarters.

1.3.2 Example 2 Speech Data

The following figure shows a small .1 second (1000 point) sample of recorded speech for the phrase \(aaa \cdots hhh\), and we note the repetitive nature of the signal and the rather regular periodicities. One can immediately notice the rather regular repetition of small wavelets. The separation between the packets is known as the pitch period and represents the response of the vocal tract filter to a periodic sequence of pulses stimulated by the opening and closing of the glottis.

plot(speech)

1.3.3 Example 3 New York Stock Exchange

As an example of financial time series data, the following figure shows the daily returns (or percent change) of the New York Stock Exchange (NYSE) from February 2, 1984 to December 31, 1991. It is easy to spot the crash of October 19, 1987 in the figure. The mean of the series appears to be stable with an average return of approximately zero, however, the volatility (or variability) of data changes over time. In fact, the data show volatility clustering; that is, highly volatile periods tend to be clustered together. A problem in the analysis of these type of financial data is to forecast the volatility of future returns. Models such as ARCH and GARCH models and stochastic volatility models have been developed to handle these problems.

plot(nyse, ylab="NYSE Returns")

2 Autocorrelation Functions (ACF)

2.1 the lag-\(\ell\) autocovariance of \(r_t\)

The covariance \(\gamma_\ell = Cov(r_t, r_{t-\ell})\) is called the lag-\(\ell\) autocovariance of \(r_t\). It has two important properties: (a) \(\gamma_0 = Var(r_t)\) and (b) \(\gamma_{-\ell} = \gamma_\ell\).

2.2 ACF

The correlation coefficient between two random variables X and Y is defined as \[\rho_{x,y} = \frac{Cov(X, Y)}{\sqrt{Var(X) Var(Y)}}.\]

Consider a weakly stationary series \(r_t\). When the linear dependence between \(r_t\) and its past values \(r_{t-i}\) is of interest, the concept of correlation is generalized to autocorrelation. The correlation coefficient between \(r_t\) and \(r_{t-i}\) is called the lag-\(\ell\) autocorrelation of \(r_t\) and is commonly denoted by \(\rho_\ell\), which under the weak stationarity assumption is a function of \(\ell\) only. Specifically, we define \[\rho_\ell = \frac{Cov(r_t, r_{t-\ell})}{\sqrt{Var(r_t) Var(r_{t-\ell})}} = \frac{Cov(r_t, r_{t-\ell})}{Var(r_t)} = \frac{\gamma_\ell}{\gamma_0},\] where the property \(Var(r_t ) = Var(r_{t-\ell})\) for a weakly stationary series is used.

For a given sample of time series \(\{r_t\}_{t=1}^T\), let \(\bar{r}\) be the sample mean \(i.e., \bar{r} = \sum_{t=1}^T r_t/T\). Then the lag-\(\ell\) sample sutocorrelation of \(r_t\) is

\[\hat{\rho}_\ell = \frac{\sum_{t=\ell+1}^T(r_t-\bar{r})(r_{t-\ell}-\bar{r})}{\sum_{t=1}^T(r_t-\bar{r})^2}, 0\le\ell<T-1\] If \(r_t\) is a week stationary time series satisfying \(r_t = \mu + \sum_{i=0}^q \psi_ia_{t-i}\), where \(\psi_0 = 1\) and \(\{a_j\}\) is a sequence of iid random variables with mean zero, then \(\hat{\rho}_\ell\) is asymptotically normal with mean zero and variance \((1+2\sum_{i=1}^q)\rho_i^2/T\) for \(\ell > q\). This is referred to as Bartlett’s formula in the time series literature, see **Box, Jenkins, and Reinsel (1994)*. For**.

You can estimate the autocorrelation function for time series using R’s acf function:

acf(x, lag.max = NULL,
    type = c("correlation", "covariance", "partial"),
    plot = TRUE, na.action = na.fail, demean = TRUE, ...).

2.3 Testing Individual ACF

For a given positive integer \(\ell\), the previous result can be used to test \(H_0:\rho_\ell = 0\) vs \(H_a:\rho_\ell \neq 0\). The statistic is \[t~ratio = \frac{\hat{\rho}_\ell}{\sqrt{(1+2\sum_{i=1}^q)\hat{\rho}_i^2/T}}\]

Hence, the decision rule of the test is to reject \(H_0\) if \(|t~ratio| > Z_{\alpha/2}\),where \(Z_{\alpha/2}\) is the \(100(1 - \alpha/2)\)th percentile of the standard normal distribution. For simplicity, many software packages use \(1/T\) as the asymptotic variance of \(\hat{\rho}_\ell\) for all \(\ell \neq 0\). They essentially assume that the underlying time series is an iid sequence.

In finite samples, \(\hat{\rho}_\ell\) is a biased estimator of \(\rho_\ell\). The bias is in the order of \(1/T\), which can be substantial when the sample size T is small. In most financial applications, T is relatively large so that the bias is not serious.

2.4 Portmanteau Test

Financial applications often require to test jointly that several autocorrelations of rt are zero. Box and Pierce (1970) propose the Portmanteau statistic \[Q^*(m) = T\sum_{\ell=1}^m \hat{\rho}_\ell^2.\] as a test statistic for the null hypothesis \(H_0: \rho_1 = \cdots = \rho_m = 0\) against the alternative hypothesis \(H_a : \rho_i \neq 0\) for some \(i \in \{1, \cdots ,m\}\). Under the assumption that \(r_t\) is an iid sequence with certain moment conditions, \(Q^*(m)\) is asymptoticaly a chi-squared random variable with m degrees of freedom.

Ljung and Box (1978) modify the \(Q^*(m)\) statistic as below to increase the power of the test in finite samples, \[Q(m) = T(T+2)\sum_{\ell=1}^{m} \frac{\hat{\rho}_\ell^2}{T-\ell}.\]

2.5 Some examples

The sample autocorrelation function (ACF) of r_t, \(\hat{\rho}_1, \hat{\rho}_2, \cdots\), plays an important role in linear time series analysis. As a matter of fact, a linear time series model can be characterized by its ACF, and linear time series modeling makes use of the sample ACF to capture the linear dynamic of the data. The following figure shows the sample autocorrelation functions of monthly simple and log returns of IBM stock from January 1926 to December 2008.

da=read.table("m-ibm3dx2608.txt",header=T) # Load data 
da[1,] # Check the 1st row of the data date
date ibmrtn vwrtn ewrtn sprtn
19260130 -0.010381 0.000724 0.023174 0.022472
sibm=da[,2] # Get the IBM simple returns 
libm=log(sibm+1) # Log IBM returns
acf(sibm)

acf(libm)

The two sample ACFs are very close to each other, and they suggest that the serial correlations of monthly IBM stock returns are very small, if any. The sample ACFs are all within their two standard error limits, indicating that they are not significantly different from zero at the 5% level. In addition, for the simple returns, the Ljung-Box statistics give \(Q(5) = 3.37\) and \(Q(10) = 13.99\), which correspond to p values of 0.64 and 0.17, respectively, based on chi-squared distributions with 5 and 10 degrees of freedom. For the log returns, we have \(Q(5) = 3.52\) and \(Q(10) = 13.39\) with p values 0.62 and 0.20, respectively. The joint tests confirm that monthly IBM stock returns have no significant serial corre- lations.

Box.test(sibm,lag=5,type='Ljung') # Ljung-Box statistic Q(5) Box-Ljung test
## 
##  Box-Ljung test
## 
## data:  sibm
## X-squared = 3.3682, df = 5, p-value = 0.6434
Box.test(sibm,lag=10,type='Ljung')
## 
##  Box-Ljung test
## 
## data:  sibm
## X-squared = 13.99, df = 10, p-value = 0.1734
Box.test(libm,lag=5,type='Ljung')
## 
##  Box-Ljung test
## 
## data:  libm
## X-squared = 3.5236, df = 5, p-value = 0.6198
Box.test(libm,lag=10,type='Ljung')
## 
##  Box-Ljung test
## 
## data:  libm
## X-squared = 13.386, df = 10, p-value = 0.2029

The following figure shows the same for the monthly returns of the value-weighted index from the Center for Research in Security Prices (CRSP), at the University of Chicago.

crsp=da[,3] # Get the CRSP simple returns 
lcrsp=log(crsp+1) # Log CRSP returns
acf(crsp)

acf(lcrsp)

There are some significant serial correlations at the \(5%\) level for both return series. The Ljung-Box statistics give \(Q(5) = 29.71\) and Q\((10) = 39.55\) for the simple returns and \(Q(5) = 28.38\) and \(Q(10) = 36.16\) for the log returns. The p values of these four test statistics are all less than 0.0001, suggesting that monthly returns of the value-weighted index are serially correlated. Thus, the monthly market index return seems to have stronger serial dependence than individual stock returns.

Box.test(crsp,lag=5,type='Ljung') # Ljung-Box statistic Q(5) Box-Ljung test
## 
##  Box-Ljung test
## 
## data:  crsp
## X-squared = 29.711, df = 5, p-value = 1.681e-05
Box.test(crsp,lag=10,type='Ljung')
## 
##  Box-Ljung test
## 
## data:  crsp
## X-squared = 39.546, df = 10, p-value = 2.037e-05
Box.test(lcrsp,lag=5,type='Ljung')
## 
##  Box-Ljung test
## 
## data:  lcrsp
## X-squared = 28.379, df = 5, p-value = 3.069e-05
Box.test(lcrsp,lag=10,type='Ljung')
## 
##  Box-Ljung test
## 
## data:  lcrsp
## X-squared = 36.162, df = 10, p-value = 7.896e-05

3 White noise and linear time series

3.1 White Noise

A time series rt is called a white noise if \({r_t}\) is a sequence of independent and identically distributed random variables with finite mean and variance. In particular, if \(r_t\) is normally distributed with mean zero and variance \(\sigma^2\), the series is called a Gaussian white noise. For a white noise series, all the ACFs are zero.

The following figure shows in the upper panel a collection of 500 such random variables, with variance \(\sigma^2 = 1\) plotted in the order in which they were drawn.

w = rnorm(500,0,1) # 500 N(0,1) variates
plot.ts(w, main="white noise") 

In practice, if all sample ACFs are close to zero, then the series is a white noise series. Based on the example in last chapter, the monthly returns of IBM stock are close to white noise, whereas those of the value-weighted index are not.

plot(ts(sibm))

plot(ts(libm))

plot(ts(crsp))

plot(ts(lcrsp))

4 AR models

The fact that the monthly return \(r_t\) of CRSP value-weighted index has a statistically significant lag-1 autocorrelation indicates that the lagged return \(r_{t-1}\) might be useful in predicting \(r_t\). A simple model that makes use of such predictive power is \[r_t = \phi_0 + \phi_1 r_{t-1} + a_t,\] where \(\{a_t\}\) is assumed to be a white noise series with mean zero and variance \(\sigma_a^2\). In the time series literature, this model is referred to as an autoregressive (AR) model of order 1 or simply an AR(1) model. This simple model is also widely used in stochastic volatility modeling when \(r_t\) is replaced by its log volatility.

A straightforward generalization of the AR(1) model is the AR(p) model: \[r_t = \phi_0 + \phi_1r_{t-} + \cdots + \phi_pr_{t-p} + a_t,\] where p is a nonnegative integer and \(\{a_t\}\) is define as above.

4.1 Properties of AR Models

4.1.1 AR(1) model

  • Expectation

    Since \(E(a_t) = 0\), we obtain \[E(r_t) = \phi_0 + \phi_1E(r_{t-1}).\] Under the stationarity condition, \(E(r_t ) = E(r_{t-1}) = \mu\) and hence \[\mu = \phi_0 + \phi_1\mu\] or \[E(r_t) = \mu =\frac{ \phi_0}{1- \phi_1}.\] This result has two implications for \(r_t\) . First, the mean of \(r_t\) exists if \(\phi_1 \neq 1\). Second, the mean of \(r_t\) is zero is and only if \(\phi_0 = 0\).

    Next, using \(\phi_0 = (1-\phi_1)\mu\), the AR(1) model can be rewritten as \[r_t = (1-\phi_1)\mu + \phi_1 r_{t-1} + a_t.\] This model is often used in the finance literature with \(\phi_1\) measuring the persistence of the dynamic dependence of an AR(1) time series.

  • Variance

    Using this property and the independence of the series \(\{a_t\}\), we obtain \[Var(r_t) = \phi_1^2Var(r_{t-1}) + \sigma_a^2.\] where \(\sigma_a^2\) is the variance of \(a_t\).

    Under the stationarity assumption, \(Var(r_t) = Var(r_{t-1})\), so that \[Var(r_t) =\frac{\sigma_a^2}{1-\phi_1^2}.\] provided that \(\phi_1^2<1\). Consequently, the weak stationarity of an AR(1) model implies that \(|\phi_1|<1\). Actually, the necessary and sufficient condition for this AR(1) model to be weakly stationary is \(|\phi_1|<1\).

  • ACF

    Using the independence between \(a_t\) and \(r_{t-1}\), and taking expectation, we obtain \[E[a_t(r_t-\mu)] = E[a_tr_t] = (1-\phi_1)\mu E[a_t] + \phi_1 E[a_tr_{t-1}] + E[a_t^2] = E[a_t^2] = \sigma_a^2.\] Multiplying by \(r_{t-\ell} - \mu\), taking expectation, and using the prior result, we have \[\gamma_0 = \phi_1\gamma_1+\sigma_a^2,\] \[\gamma_\ell = \phi_1\gamma_{\ell-1},\forall \ell > 0.\] Consequently, for a weakly stationary AR(1) model, we have \[Var(r_t) = \gamma_0 = \frac{\sigma^2}{1-\phi_1^2}.\] Therefore, the ACF of \(r_t\) satisfies \[\rho_\ell = \phi_1\rho_{\ell-1}, \forall \ell > 0.\] Because \(\rho_0 = 1\), we have \(\rho_\ell = \phi_1^\ell\).

4.1.2 AR(2) model

  • Form

\[r_t = \phi_0+\phi_1r_{t-1}+\phi_2 r_{t-2}+a_t.\]

  • Expectation

\[E(r_t) = \mu = \frac{\phi_0}{1-\phi_1-\phi_2}.\]

  • ACF

    The autocovariance function of AR(2) model is \[\gamma_\ell = \phi_1\gamma_{\ell-1}+\phi_2\gamma_{\ell-2}, \forall \ell > 0.\] This result is referred to as the moment equation of a stationary AR(2) model. Dividing the above equation by \(\gamma_0\), we have the property \[\rho_\ell = \phi_1\rho_{\ell-1}+\phi_2\rho_{\ell-2}, \forall \ell > 0.\]

    In particular, the lag-1 ACF satisfie \[\rho_1 = \phi_1\rho_0 + \phi_2\rho_{-1} = \phi_1\rho_0 + \phi_2\rho_{1}.\] Therefore, for a stationary AR(2) series \(r_t\), we have \(\rho_0=1,\) \[\rho_1 = \frac{\phi_1}{1-\phi_2},\] \[\rho_\ell = \phi_1\rho_{\ell-1}+\phi_2\rho_{\ell-2}, \forall \ell \ge 2.\]

    This result says that the ACF of a stationary AR(2) series satisfies the second-order difference equation \[ (1-\phi_1B-\phi_2B^2)\rho_\ell=0.\] where B is called the back-shift operator such that \(B\rho_\ell = \rho_{\ell-1}\).

Corresponding to the prior difference equation, there is a second-order polyno- mial equation: \[1-\phi_1x-\phi_2x^2 = 0.\]

In the time series literature, inverses of the two solutions are referred to as the characteristic roots of the AR(2) model. Denote the two characteristic roots by \(\omega_1\) and \(\omega_2\). If both \(\omega_i\) are real valued, then the second-order difference equation of the model can be factored as \((1 - \omega_1B)(1 - \omega_2B)\) and the AR(2) model can be regarded as an AR(1) model operates on top of another AR(1) model. The ACF of \(r_t\) is then a mixture of two exponential decays. If \(\phi_1^2 + 4\phi_2<0\), then \(\omega_1\) and \(\omega_2\) are complex numbers (called a complex-conjugate pair), and the plot of ACF of \(r_t\) would show a picture of damping sine and cosine waves. In business and economic applications, complex characteristic roots are important. They give rise to the behavior of business cycles. It is then common for economic time series models to have complex-valued characteristic roots. For an AR(2) model with a pair of complex characteristic roots, the average length of the stochastic cycles is \[k = \frac{2\pi}{cos^{-1}(\phi_1/\sqrt{-\phi_2})},\] where the cosine inverse is stated in radians. If one writes the complex solutions as \(a\pm bi\),where \(i = \sqrt{-1}\), then we have \(\phi_1 = 2a\), \(\phi_2 = -(a^2+b^2)\), and \[k = \frac{2\pi}{cos^{-1}(a/\sqrt{a^2+b^2})},\] where the \(\sqrt{a^2+b^2}\) the absolute value of \(a\pm bi\)

4.1.3 AR(p) model

The results of the AR(1) model can readily be generalized to the general AR(p) model. The mean of a stationary series is \[E(r_t) = \frac{\phi_0}{1-\phi_1-\cdots-\phi_p}\] provided that the denominator is not zero. The associated characteristic equation of the model is \[1 - \phi_1x-\phi_2x^2-\cdots-\phi_px^p = 0\] If all the solutions of this equation are greater than 1 in modulus, then the series \(r_t\) is stationary. Again, inverses of the solutions are the characteristic roots of the model. Thus, stationarity requires that all characteristic roots are less than 1 in modulus. For a stationary AR(p) series, the ACF satisfies the difference equation \[(1 - \phi_1B-\phi_2B^2-\cdots-\phi_pB^p)\rho_\ell= 0, \forall \ell>0. \] The plot of ACF of a stationary AR(p) model would then show a mixture of damping sine and cosine patterns and exponential decays depending on the nature of its characteristic roots.

4.2 An example

As an illustration, consider the quarterly growth rate of U.S. real gross national product (GNP), seasonally adjusted, from the second quarter of 1947 to the first quarter of 1991. Here we simply employ an AR(3) model for the data.

gnp=scan(file='dgnp82.txt') # Load data 
#To create a time-series object
gnp1=ts(gnp,frequency=4,start=c(1947,2)) 
plot(gnp1)
points(gnp1,pch='*')

m1=ar(gnp,method="mle") # Find the AR order
m1$order # An AR(3) is selected based on AIC
## [1] 3
m2=arima(gnp,order=c(3,0,0)) # Estimation
m2
## 
## Call:
## arima(x = gnp, order = c(3, 0, 0))
## 
## Coefficients:
##          ar1     ar2      ar3  intercept
##       0.3480  0.1793  -0.1423     0.0077
## s.e.  0.0745  0.0778   0.0745     0.0012
## 
## sigma^2 estimated as 9.427e-05:  log likelihood = 565.84,  aic = -1121.68
# In R, "intercept" denotes the mean of the series.
# Therefore, the constant term is obtained below: 
(1-.348-.1793+.1423)*0.0077
## [1] 0.0047355
sqrt(m2$sigma2) # Residual standard error
## [1] 0.009709322
p1=c(1,-m2$coef[1:3]) # Characteristic equation
roots=polyroot(p1) # Find solutions
roots
## [1]  1.590253+1.063882i -1.920152+0.000000i  1.590253-1.063882i
Mod(roots) # Compute the absolute values of the solutions
## [1] 1.913308 1.920152 1.913308
# To compute average length of business cycles:
k=2*pi/acos(1.590253/1.913308) 
k
## [1] 10.65638

4.3 Identifying AR Models in Practice

In application, the order p of an AR time series is unknown. It must be specified empirically. This is referred to as the order determination (or order specification) of AR models, and it has been extensively studied in the time series literature. Two general approaches are available for determining the value of p. The first approach is to use the partial autocorrelation function, and the second approach uses some information criteria.

4.3.1 Partial Autocorrelation Function (PACF)

The PACF of a stationary time series is a function of its ACF and is a useful tool for determining the order p of an AR model. A simple, yet effective way to introduce PACF is to consider the following AR models in consecutive orders: \[r_t = \phi_{0,1} + \phi_{1,1} r_{t-1} + e_{1t},\] \[r_t = \phi_{0,2} + \phi_{1,2} r_{t-1}+ \phi_{2,2} r_{t-2} + e_{2t},\] \[r_t = \phi_{0,3} + \phi_{1,3} r_{t-1}+ \phi_{2,3} r_{t-2}+ \phi_{3,3} r_{t-3} + e_{3t},\] \[\vdots\] where \(\phi_{0,j}, \phi_{i,j}\),and \(\{e_{jt} \}\) are, respectively, the constant term, the coefficient of \(r_{t-i}\),and the error term of an AR(j) model. These models are in the form of a multiple linear regression and can be estimated by the least-squares method. As a matter of fact, they are arranged in a sequential order that enables us to apply the idea of partial F test in multiple linear regression analysis. The estimate \(\hat{\phi}_{1,1}\) of the first equation is called the lag-1 sample PACF of \(r_t\) . The estimate \(\hat{\phi}_{2,2}\) of the second equation is the lag-2 sample PACF of \(r_t\) . The estimate \(\hat{\phi}_{3,3}\) of the third equation is the lag-3 sample PACF of \(r_t\), and so on.

From the definition, the lag-2 PACF \(\hat{\phi}_{2,2}\) shows the added contribution of \(r_{t-2}\) to \(r_t\) over the AR(1) model \(r_t = \phi_0 + \phi_1r_{t-1} + e_{1t}\). The lag-3 PACF shows the added contribution of \(r_{t-3}\) to \(r_t\) over an AR(2) model, and so on. Therefore, for an AR(p) model, the lag-p sample PACF should not be zero, but \(\hat{\phi}_{j,j}\) should be close to zero for all \(j > p\). We make use of this property to determine the order p. For a stationary Gaussian AR(p) model, it can be shown that the sample PACF has the following properties:

  • \(\hat{\phi}_{p,p}\) converges to \(\phi_p\) as the sample size T goes to infinity

  • \(\hat{\phi}_{\ell, \ell}\) converges to zero for all \(\ell > p\).

  • The asymptotic variance of \(\hat{\phi}_{\ell, \ell}\) is \(1/T\) for \(\ell>p\).

These results say that, for an AR(p) series, the sample PACF cuts off at lag p.

As an example, consider the monthly simple returns of CRSP value-weighted index from January 1926 to December 2008. The following result gives the first 12 lags of sample PACF of the series. With T = 996, the asymptotic standard error of the sample PACF is approximately 0.032. Therefore, using the 5% significant level, we identify an AR(3) or AR(9) model for the data (i.e., p = 3 or 9). If the 1% significant level is used, we specify an AR(3) model.

pacf(crsp, lag.max = 12)

pacf(crsp, lag.max = 12, plot = FALSE)
## 
## Partial autocorrelations of series 'crsp', by lag
## 
##      1      2      3      4      5      6      7      8      9     10 
##  0.115 -0.030 -0.102  0.033  0.062 -0.050  0.031  0.052  0.063  0.005 
##     11     12 
## -0.005  0.011

The function pacf is an alias for acf, except with the default type of “partial”:

pacf(x, lag.max, plot, na.action, ...)

4.3.2 Information Criteria

There are several information criteria available to determine the order p of an AR process. All of them are likelihood based. For example, the well-known Akaike information criterion (AIC) (Akaike, 1973) is defined as \[AIC = \frac{-2}{T}\ln(likelihood) + \frac{2}{T}\times(number~of~parameters),\] where the likelihood function is evaluated at the maximum-likelihood estimates and T is the sample size. For a Gaussian AR(\(\ell\)) model, AIC reduces to \[AIC(\ell) = \ln(\tilde{\sigma}_\ell^2) + \frac{2\ell}{T}.\] where \(\tilde{\sigma}_\ell^2\) is the maximum-likelihood estimate of \(\sigma_a^2\) , which is the variance of \(a_t\), and T is the sample size. The first term of the AIC measures the goodness of fit of the AR(\(\ell\)) model to the data, whereas the second term is called the penalty function of the criterion because it penalizes a candidate model by the number of parameters used. Different penalty functions result in different information criteria.

Another commonly used criterion function is the Schwarz-Bayesian information criterion (BIC). For a Gaussian AR(\(\ell\)) model, the criterion is \[BIC(\ell) = \ln(\tilde{\sigma}_\ell^2) + \frac{\ell\ln(T)}{T}.\] The penalty for each parameter used is 2 for AIC and \(\ln(T)\) for BIC. Thus, com- pared with AIC, BIC tends to select a lower AR model when the sample size is moderate or large.

To use AIC to select an AR model in practice, one computes\(AIC(\ell)\) for \(\ell = 0.\cdots,P\), where p is a prespecified positive integer and selects the order k that has the minimum AIC value. The same rule applies to BIC.

The following result also gives the AIC for \(p = 1,\cdots,12\). The AIC values are close to each other with minimizer occurring at \(p = 9\), suggesting that an AR(9) model is preferred by the criterion.

The following result also gives the AIC and BIC for \(p = 1,\cdots,12\). The AIC values are close to each other with minimizer occurring at \(p = 9\), suggesting that an AR(9) model is preferred by the criterion. The BIC, on the other hand, attains its minimum value -5.833 at \(p = 1\) with -5.831 as a close second at \(p = 3\). Thus, the BIC selects an AR(1) model for the value-weighted return series.

Different approaches or criteria to order determination may result in different choices of p. There is no evidence to suggest that one approach outperforms the other in a real application. Substantive information of the problem under study and simplicity are two factors that also play an important role in choosing an AR model for a given time series.

ord = ar(crsp, method = "mle")
ord$aic # The differences in AIC between each model and the best-fitting model.
##         0         1         2         3         4         5         6 
## 22.329967 10.990260 12.066700  3.350972  4.365413  2.462650  1.960128 
##         7         8         9        10        11        12 
##  3.041666  2.243258  0.000000  1.966641  3.942486  5.811573
ord$order
## [1] 9
ord
## 
## Call:
## ar(x = crsp, method = "mle")
## 
## Coefficients:
##       1        2        3        4        5        6        7        8  
##  0.1167  -0.0112  -0.1126   0.0217   0.0735  -0.0452   0.0254   0.0462  
##       9  
##  0.0660  
## 
## Order selected 9  sigma^2 estimated as  0.002831

4.4 Model Checking

A fitted model must be examined carefully to check for possible model inadequacy. If the model is adequate, then the residual series should behave as a white noise. The ACF and the Ljung-Box statistics of the residuals can be used to check the closeness of \(\hat{a}_t\) to a white noise. For an \(AR(p)\) model, the Ljung-Box statistic Q(m) follows asymptotically a chi-squared distribution with \(m-g\) degrees of freedom, where \(g\) denotes the number of AR coefficients used in the model. The adjustment in the degrees of freedom is made based on the number of constraints added to the residuals \(\hat{a}_t\) from fitting the \(AR(p)\) to an \(AR(0)\) model. If a fitted model is found to be inadequate, it must be refined. For instance, if some of the estimated AR coefficients are not significantly different from zero, then the model should be simplified by trying to remove those insignificant parameters. If residual ACF shows additional serial correlations, then the model should be extended to take care of those correlations.

Most time series packages do not adjust the degrees of freedom when applying the Ljung-Box statistics Q(m) to a residual series. This is understandable when \(m \le g\). Consider the residual series of the fitted AR(3) model for the monthly value-weighted simple returns. We have \(Q(12) = 16.35\) with a p value 0.060 based on its asymptotic chi-squared distribution with 9 degrees of freedom. Thus, the null hypothesis of no residual serial correlation in the first 12 lags is barely not rejected at the 5% level. However, since the lag-2 AR coefficient is not significant at the 5% level, one can refine the model as \[r_t = 0.0088 + 0.114r_{t-1} - 0.106r_{t-3}+a_t,\hat{\sigma}_a = 0.0536\] where all the estimates are now significant at the 1% level. The residual series gives \(Q(12) = 16.83\) with a p value 0.078 (based on \(\chi_{10}^2\). The model is adequate in modeling the dynamic linear dependence of the data.

vw = read.table('m-ibm3dx2608.txt', header=TRUE)[,3]
m3 = arima(vw,order=c(3,0,0))
m3
## 
## Call:
## arima(x = vw, order = c(3, 0, 0))
## 
## Coefficients:
##          ar1      ar2      ar3  intercept
##       0.1158  -0.0187  -0.1042     0.0089
## s.e.  0.0315   0.0317   0.0317     0.0017
## 
## sigma^2 estimated as 0.002875:  log likelihood = 1500.86,  aic = -2991.73
(1-.1158+.0187+.1042)*mean(vw) # Compute the intercept phi(0)
## [1] 0.008967611
sqrt(m3$sigma2) # Compute standard error of residuals
## [1] 0.0536189
(1-pnorm(abs(m3$coef)/sqrt(diag(m3$var.coef))))*2 # p value of parameters
##          ar1          ar2          ar3    intercept 
## 2.372637e-04 5.546234e-01 1.029105e-03 1.145237e-07
Box.test(m3$residuals,lag=12,type='Ljung')
## 
##  Box-Ljung test
## 
## data:  m3$residuals
## X-squared = 16.352, df = 12, p-value = 0.1756
pv=1-pchisq(16.35,9) # Compute p-value using 9 degrees of freedom
pv
## [1] 0.05992276
m3=arima(vw,order=c(3,0,0),fixed=c(NA,0,NA,NA)) # The subcommand 'fixed' is used to fix parameter values, where NA denotes estimation and 0 means fixing the parameter to 0. The ordering of the parameters can be found using m3$coef.
## Warning in arima(vw, order = c(3, 0, 0), fixed = c(NA, 0, NA, NA)): some AR
## parameters were fixed: setting transform.pars = FALSE
m3
## 
## Call:
## arima(x = vw, order = c(3, 0, 0), fixed = c(NA, 0, NA, NA))
## 
## Coefficients:
##          ar1  ar2      ar3  intercept
##       0.1136    0  -0.1063     0.0089
## s.e.  0.0313    0   0.0315     0.0017
## 
## sigma^2 estimated as 0.002876:  log likelihood = 1500.69,  aic = -2993.38
(1-.1136+.1063)*.0089 # Compute phi(0)
## [1] 0.00883503
sqrt(m3$sigma2) # Compute residual standard error
## [1] 0.05362832
(1-pnorm(abs(m3$coef[-2])/sqrt(diag(m3$var.coef))))*2 # p value of parameters
##          ar1          ar3    intercept 
## 2.833035e-04 7.519576e-04 1.745776e-07
Box.test(m3$residuals,lag=12,type='Ljung')
## 
##  Box-Ljung test
## 
## data:  m3$residuals
## X-squared = 16.828, df = 12, p-value = 0.1562
pv=1-pchisq(16.83,10)
pv
## [1] 0.07821131

5 MA models

5.1 Introduction

We now turn to another class of simple models that are also useful in modeling return series in finance. These models are the moving-average (MA) models. There are several ways to introduce MA models. One approach is to treat the model as a simple extension of white noise series. Another approach is to treat the model as an infinite-order AR model with some parameter constraints. We adopt the second approach.

There is no particular reason, but simplicity, to assume a priori that the order of an AR model is finite. We may entertain, at least in theory, an AR model with infinite order as \[r_t = \phi_0 + \phi_1r_{t-1} + \phi_2r_{t-2} + \cdots + a_t .\] However, such an AR model is not realistic because it has infinite many parameters. One way to make the model practical is to assume that the coefficients \(\phi_i\)’s satisfy some constraints so that they are determined by a finite number of parameters. A special case of this idea is \[r_t = \phi_0 - \theta_1r_{t-1} - \theta_1^2r_{t-2} - \theta_1^3r_{t-1} - \cdots + a_t ,\] where the coefficients depend on a single parameter \(\theta_1\) via \(\phi_i = -\theta_1^i\) for \(i \ge 1\). For this model to be stationary, \(\theta_1\) must be less than 1 in absolute value; otherwise, \(\theta_1^i\) and the series will explode. Because \(|\theta_1| < 1\), we have \(\theta_1^i \to 0\) as \(i \to \infty\). Thus, the contribution of \(r_{t-i}\) to \(r_t\) decays exponentially as i increases. This is reasonable as the dependence of a stationary series \(r_t\) on its lagged value \(r_{t-i}\).

This model can be rewritten in a rather compact form. To see this, rewrite the model as \[r_t + \theta_1r_{t-1} + \theta_1^2 r_{t-2} + \cdots = \phi_0 + a_t .\] The model for \(r_{t-1}\) is then \[r_{t-1} + \theta_1r_{t-2} + \theta_1^2 r_{t-3} + \cdots = \phi_0 + a_{t-1} .\] From this two equation, we obtain \[r_t = \phi_0(1 - \theta_1) + a_t - \theta_1a_{t-1}.\] which says that except for the constant term \(r_t\) is a weighted average of shocks at and \(a_{t-1}\). Therefore, the model is called an MA model of order 1 or MA(1) model for short. The general form of an MA(1) model is \[r_t = c_0 + a_t - \theta_1a_{t-1} ~~or~~ r_t = c_0 + (1 - \theta_1B)a_t ,\] where \(c_0\) is a constant and \(\{a_t \}\) is a white noise series. Similarly, an MA(2) model is in the form \[r_t = c_0 + a_t - \theta_1a_{t-1} - \theta_2a_{t-2},\] andanMA(q) model is \[r_t = c_0 + a_t - \theta_1a_{t-1} - \cdots - \theta_qa_{t-q},\] or \[r_t = c_0 + (1 - \theta_1B - \cdots - \theta_qB^q)a_t,\] where \(q >0\).

5.2 Properties of MA Models

We focus on the simple MA(1) and MA(2) models. The results of MA(q) models can easily be obtained by the same techniques.

5.2.1 Stationarity

Moving-average models are always weakly stationary because they are finite linear combinations of a white noise sequence for which the first two moments are time invariant. Taking expectation of the model, we have \[E(r_t ) = c_0,\] which is time invariant. Taking the variance of MA(1) model, we have \[Var(r_t ) = \sigma_a^2 + \theta_1^2 \sigma_a^2 = (1 + \theta_1^2 )\sigma_a^2 ,\] where we use the fact that \(a_t\) and \(a_{t-1}\) are uncorrelated. Again, \(Var(r_t )\) is time invariant. The prior discussion applies to general MA(q) models, and we obtain two general properties. First, the constant term of an MA model is the mean of the series [i.e., \(E(r_t ) = c_0\)]. Second, the variance of an MA(q) model is \[Var(r_t ) = (1 + \theta_1^2 + \theta_2^2 + \cdots + \theta_q^2 )\sigma_a^2.\]

5.2.2 Autocorrelation Function

Assume for simplicity that \(c_0 = 0\) for an MA(1) model. Multiplying the model by \(r_{t-\ell}\),we have \[r_{t-\ell}r_t = r_{t-\ell}a_t - \theta_1r_{t-\ell}a_{t-1}.\] Taking expectation, we obtain \[\gamma_1 =-\theta_1\sigma_a^2 ~~and~~ \gamma_\ell = 0, ~for~ \ell>1.\] Using the prior result and the fact that \(Var(r_t ) = (1 + \theta_1^2 )\sigma_a^2\), we have \[\rho_0 = 1, ~\rho_1 = \frac{-\theta_2}{1+\theta_1^2}, ~\rho_\ell = 0, ~for~\ell>1.\] Thus, for an MA(1) model, the lag-1 ACF is not zero, but all higher order ACFs are zero. In other words, the ACF of an MA(1) model cuts off at lag 1. For the MA(2) model, the autocorrelation coefficients are \[\rho_1 = \frac{-\theta_1 + \theta_1\theta_2}{1+\theta_1^2+\theta_2^2}, ~\rho_2 = \frac{-\theta_2}{1+\theta_1^2+\theta_2^2}, ~\rho_\ell = 0, ~for~\ell>2.\] Here the ACF cuts off at lag 2. This property generalizes to other MA models. For an MA(q) model, the lag-q ACF is not zero, but \(\rho_\ell = 0\) for \(\ell>q\). Consequently, an MA(q) series is only linearly related to its first q-lagged values and hence is a “finite-memory” model.

5.2.3 Invertibility

Rewriting a zero-mean MA(1) model as \(a_t = r_t + \theta_1a_{t-1}\), one can use repeated substitutions to obtain \[a_t = r_t + \theta_1r_{t-1} + \theta_1^2 r_{t-2} + \theta_1^3 r_{t-3} + \cdots .\] This equation expresses the current shock \(a_t\) as a linear combination of the present and past returns. Intuitively, \(\theta_1^j\) should go to zero as j increases because the remote for an MA(1) model to be plausible, we require \(|\theta_1| < 1\). Such an MA(1) model return \(r_{t-j}\) should have very little impact on the current shock, if any. Consequently, is said to be invertible.If \(|\theta_1|= 1\), then the MA(1) model is noninvertible.

5.3 Identifying MA Order

The ACF is useful in identifying the order of an MA model. For a time series \(r_t\) with ACF \(\rho_\ell\), if \(\rho_q \ne 0\), but \(\rho_\ell = 0\) for \(\ell>q\), then \(r_t\) follows an MA(q) model.

The following figure shows the time plot of monthly simple returns of the CRSP equal-weighted index from January 1926 to December 2008 and the sample ACF of the series.

# library(zoo)
ts_index = seq(as.Date("1926/1/30"), by = "month", length.out = length(crsp))
ts_crsp = zoo(crsp, ts_index)
plot(ts_crsp)

acf(crsp)

The two dashed lines shown on the ACF plot denote the two standard error limits. It is seen that the series has significant ACF at lags 1, 3, and 9. There are some marginally significant ACF at higher lags, but we do not consider them here. Based on the sample ACF, the following MA(9) model \[r_t = c_0+a_t-\theta_1a_{t-1} - \theta_3a_{t-3} - \theta_9a_{t-9}\] is identified for the series. Note that, unlike the sample PACF, sample ACF provides information on the nonzero MA lags of the model.

5.3.1 Estimation

arima(crsp, order = c(0,0,9), fixed = c(NA, 0, NA, 0, 0, 0, 0, 0, NA, NA))
## 
## Call:
## arima(x = crsp, order = c(0, 0, 9), fixed = c(NA, 0, NA, 0, 0, 0, 0, 0, NA, 
##     NA))
## 
## Coefficients:
##          ma1  ma2      ma3  ma4  ma5  ma6  ma7  ma8     ma9  intercept
##       0.1108    0  -0.1174    0    0    0    0    0  0.0743     0.0089
## s.e.  0.0304    0   0.0324    0    0    0    0    0  0.0310     0.0018
## 
## sigma^2 estimated as 0.002855:  log likelihood = 1504.27,  aic = -2998.55

The function ccf plots the cross-correlation function for two time series:

ccf(x, y, lag.max = NULL, type = c("correlation", "covariance"),
    plot = TRUE, na.action = na.fail, ...)

Time Series Models Time series models are a little different from other models that we’ve seen in R. With most other models, the goal is to predict a value (the response variable) from a set of other variables (the predictor variables). Usually, we explicitly assume that there is no autocorrelation: that the sequence of observations does not matter.

With time series, we assume the opposite: we assume that previous observations help predict future observations.

To fit an autoregressive model to a time series, use the function ar:

ar(x, aic = TRUE, order.max = NULL, 
   method=c("yule-walker", "burg", "ols", "mle", "yw"), 
   na.action, series, ...)

Here is a description of the arguments to ar.

Argument Description Default
x A time series.
aic A logical value that specifies whether the Akaike information criterion is used to choose the order of the model. TRUE
order.max A numeric value specifying the maximum order of the model to fit. NULL
method A character value that specifies the method to use for fitting the model. Specify method=“yw” (or method=“yule-walker”) for the Yule-Walker method, method=“burg” for the Burg method, method=“ols” for ordinary least squares, or method=“mle” for maximum likelihood estimation. c(“yule-walker”, “burg”, “ols”, “mle”, “yw”)
na.action A function that specifies how to handle missing values.
series A character vector of names for the series.
demean A logical value specifying if a mean should be estimated during fitting.
var.method Specifies the method used to estimate the innovations variance when method=“ar.burg”.
... Additional arguments, depending on method.

The ar function actually calls one of four other functions, depending on the fit method chosen: ar.yw, ar.burg, ar.ols, or ar.mle. As an example, let’s fit an autoregressive model to the turkey price data:

# library(nutshell)
data(turkey.price.ts)
turkey.price.ts.ar <- ar(turkey.price.ts)
turkey.price.ts.ar
## 
## Call:
## ar(x = turkey.price.ts)
## 
## Coefficients:
##       1        2        3        4        5        6        7        8  
##  0.3353  -0.1868  -0.0024   0.0571  -0.1554  -0.0208   0.0914  -0.0658  
##       9       10       11       12  
## -0.0952   0.0649   0.0099   0.5714  
## 
## Order selected 12  sigma^2 estimated as  0.05182

You can use the model to predict future values. To do this, use the predict function. Here is the method for ar objects:

predict(object, newdata, n.ahead = 1, se.fit = TRUE, ...)

The argument object specifies the model object to use for prediction. You can use newdata to specify new data to use for prediction, or n.ahead to specify a number of periods ahead to predict. The argument se.fit specifies whether to return standard errors of the prediction error.

Here is a forecast for the next 12 months for turkey prices:

predict(turkey.price.ts.ar,n.ahead=12)
## $pred
##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 2008                                         1.8827277 1.7209182 1.7715016
## 2009 1.5439290 1.6971933 1.5849406 1.7800358                              
##            Aug       Sep       Oct       Nov       Dec
## 2008 1.9416776 1.7791961 1.4822070 0.9894343 1.1588863
## 2009                                                  
## 
## $se
##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 2008                                         0.2276439 0.2400967 0.2406938
## 2009 0.2450732 0.2470678 0.2470864 0.2480176                              
##            Aug       Sep       Oct       Nov       Dec
## 2008 0.2415644 0.2417360 0.2429339 0.2444610 0.2449850
## 2009

To take a look at a forecast from an autoregressive model, you can use the function ts.plot. This function plots multiple time series on a single chart, even if the times are not overlapping. You can specify colors, line types, or other characteristics of each series as vectors; the ith place in the vector determines the property for the \(i^{th}\) series.

Here is how to plot the turkey price time series as a solid line, and a projection 24 months into the future as a dashed line:

ts.plot(turkey.price.ts, predict(turkey.price.ts.ar,n.ahead=24)$pred, lty=c(1:2))

You can also fit autoregressive integrated moving average (ARIMA) models in R using the arima function:

arima(x, order = c(0, 0, 0), 
      seasonal = list(order = c(0, 0, 0), period = NA),
      xreg = NULL, include.mean = TRUE,
      transform.pars = TRUE, 
      fixed = NULL, init = NULL, 
      method = c("CSS-ML", "ML", "CSS"), 
      n.cond, optim.method = "BFGS", 
      optim.control = list(), kappa = 1e6)

Here is a description of the arguments to arima.

Argument Description Default
x A time series.
order A numeric vector (p, d, q), where p is the AR order, d is the degree of differencing, and q is the MA order. c(0, 0, 0)
seasonal A list specifying the seasonal part of the model. The list contains two parts: the order and the period. list(order = c(0, 0, 0), period = NA)
xreg An (optional) vector or matrix of external regressors (with the same number of rows as x). NULL
include.mean A logical value specifying whether the model should include a mean/intercept term. TRUE
tranform.pars A logical value specifying whether the AR parameters should be transformed to ensure that they remain in the region of stationarity. TRUE
fixed An optional numeric vector specifying fixed values for parameters. (Only NA values are varied.) NULL
init A numeric vector of initial parameter values. NULL
method A character value specifying the fitting method to use. The default setting, method=“CSS-ML”, uses conditional sum of squares to find starting values, then maximum likelihood. Specify method=“ML” for maximum likelihood only, or method=“CSS” for conditional sum of squares only. c(“CSS-ML”, “ML”, “CSS”)
n.cond A numeric value indicating the number of initial values to ignore (only used for conditional sum of squares).

The arima function uses the optim function to fit models. You can use the result of an ARIMA model to smooth a time series with the tsSmooth function. For more information, see the help file for tsSmooth.