Time Series Regression with Stationary Variables: An Introduction to the ARDL Model

Dynamic Nature of Relationships

Given the dynamic relationship of time series data, there are three different ways of modeling these relationships.

The dependent variable \(y\) is a function of current and past values of an explanatory variable \(x\). That is,

\[ \begin{aligned} y_{t} = f(x_{t}, x_{t-1}, x_{t-2}, \ldots) \end{aligned} \]

We can think of \((y_{t}, x_{t})\) as denoting the values of \(y\) and \(x\) in the current period; \(x_{t-1}\) denotes teh value of \(x\) in the previous period, \(x_{t-2}\) denotes the value of x in the previous period, and so on. Because of the lagged effects, the above equation is called a distributed lag (DL) model.

Specify a modelw with a lagged dependent variable as one of the explanatory variables.

\[ \begin{aligned} y_{t} = y(y_{t-1}, x_{t}) \end{aligned} \]

We can also combine the first two features of the above and previous equation so that we have a dynamic model with lagged values of both the dependent and independent variables, such as

\[ \begin{aligned} y_{t} = f(y_{t-1}, x_{t}, x_{t-1}, x_{t-2}, \ldots) \end{aligned} \]

Such models are called autogregressive distributed lag models (ARDL), which is the model we are interested in implementing. As we have shown, the ARDL model is composed of an autoregressive component, which is the dependent variable regressed on one or more of its past values, and a distributed lag component, which is the independent variable and one or more of its lagged components.

A third way of modeling the continuing change over several periods is via an error term. For example, using \(f(.)\) and \(g(.)\), both of whcih are replaced later with linear functions, we can write

\[ \begin{aligned} y_{t} = f(x_{t}) + e_{t} && e_{t}=g(e_{t-1}) \end{aligned} \]

where the function \(e_{t}=g(e_{t-1})\) is used to denote the dependence of the error term on its value in the previosu period. In this case, \(e_{t}\) is correlated with \(e_{t-1}\), and in such a scenario, we say the errors are serially correlated or autocorrelated.

Least Squares Assumptions

The least squares assumptions for time series data are

\[ \begin{aligned} \text{cov}(y_{t}, y_{s}) = \text{cov}(e_{t}, e_{s}) = 0 && \text{for} & t \neq s \end{aligned} \]

The dynamic nature of the 2nd, 3rd, and 4th models introduced imply correlation between \(y_{t}\) and \(y_{t-1}\), or \(e_{t}\) and \(e_{t-1}\), and so they clearly violate this assumption.

Stationarity

An assumption that we will maintain throughout this exercise is that variables in our equation are stationary, which means that a variable is one that is not explosive, nor trending, and nor wandering aimlessly without returning to its mean. You can learn more on stationarity here, but a stationary variable simply means a variable whose mean, variance and other statistical properties remain constant over time.

Finite Distributed Lags

The first dynamic relationship we consider is the first model that we introduced, which took the form of \(y_{t} = f(x_{t}, x_{t-1}, x_{t-2}, \ldots)\), with the additional assumptions that the relationship is linear, and after \(q\) time periods, changes in \(x\) no longer have an impact on \(y\). Under these conditions, we ahve the multiple regression model

\[ \begin{aligned} y_{t} = \alpha + \beta_{0}x_{t} + \beta{1}x_{t-1} + \beta{2}x_{t-2} + \ldots + \beta{q}x_{t-q} + e_{t} \end{aligned} \]

The above model can be treated in the same way as a multiple regression model. Instead of having a number of different explanatory variables, we have a number of different lags of the same explanatory variable. This equation can be very useful in two ways:

Forecasting future values of \(y\). To introduce notation for future values, suppose our sample period is \(t=1,2, \ldots , T\). We use \(t\) for the index and \(T\) for the sample size to emphasize the time series nature of the data. Given the last observation in our sample is at \(t=T\), the first postsample observation that we wish to forecast is at \(t=T+1\). The equation for this observation can be given by

\[ \begin{aligned} y_{T+1}=\alpha + \beta_{0}x_{T+1} + \beta_{1}x_{T} + \beta{2}x_{T-1} + \ldots + \beta{q}x_{T-q+1}+e_{T+1} \end{aligned} \]

Strategic Analysis. For example, to use an economic example, understanding the effects of the change in interest rate on unemployment and inflation, or the effect of advertising on sales on a firm’s products. The coefficient \(\beta_{s}\) gives the change in \(E(y_{t})\) when \(x_{t-s}\) changes by one unit, but \(x\) is held constant in other periods. Alternatively, if we look forward rather than backward, \(\beta_{s}\) gives the changes in \(E(y_{t+s})\) when \(x_{t}\) changes by one unit, but \(x\) in other periods is held constant. In terms fo derivatives,

\[ \begin{aligned} \frac{\partial E(y_{t})}{\partial x_{t-s}} = \frac{\partial E(y_{t+s})}{\partial x_{t}} = \beta_{s} \end{aligned} \]

The effect of a one-unit change in \(x_{t}\) is distributed over the curent and next \(q\) periods. It is called a finite distributed lag model of order \(q\) because it is assumed that after a finite number of periods \(q\), changes in \(x\) no longer have an impact on \(y\). The coefficient \(\beta_{s}\) is called a distributed-lag weight or an s-period delayed multiplier. The coefficient \(\beta_{0}\) is called the impact multiplier. Adding up a portion of the coefficients gives you the interim multipliers. For example, the interm multiplier for two periods would be \((\beta_{0} + \beta_{1} + \beta{2})\). The total multiplier is the final effect on \(y\) on the sustained increase after \(q\) or more periods have elapsed and is given by the equation \(\sum_{s=0}^{q} \beta_{s}\).

Assumptions

In distributed lag models, both \(y\) and \(x\) are typically random. That is, we do not know their values prior to sampling. We do not “set” output growth, for example, and then observe the resulting level of unemployment. To accommodate for this stochastic process, we assume that the \(x\)’s are random and that the error term \(e_{t}\) is independent of all \(x\)’s in the sample - past, current, and future. This assumption, in conjunction with the other multiple regression assumptions, is sufficient for the least squares estimator to be unbiased and to be BLUE, conditional on the \(x\)’s in the sample. The assumptions of the distributed lag model are

\(y_{t}=\alpha+\beta_{0}x_{t}+\beta_{1}x_{t-1}+\beta_{2}x_{t-2}+ \ldots + \beta_{q}x_{t-q}+e_{t}\)
\(y\) and \(x\) are stationary random variables, and \(e_{t}\) is independent of current, past, and future values of \(x\).
\(E(e_{t})=0\)
\(\text{var}(e_{t})=\sigma^{2}\)
\(\text{cov}(e_{t}, e_{s}) = 0 \qquad t \neq s\)
\(e_{t} \sim N(0, \sigma^{2})\)

An Example: Okun’s Law

At this point, we’ve learned the very basics needed to apply what we have just covered to apply our model to a real world example. We will apply the finite distributed lag model to Okun’s Law. We won’t go into too much detail about what Okun’s Law as it is not within the scope of this excersise, but Arther Melvin Okun was an economist who posited that there was a relationship between the change in unemployment from one period to the next and the rate of growth of output in the economy. Mathetmatically, Okun’s Law can be expressed as

\[ \begin{aligned} U_{t}-U_{t-1}=-\gamma(G_{t}-G_{N}) \end{aligned} \]

where \(U_{t}\) is the unemployment rate in period \(t\), \(G_{t}\) is the growth rate of output in period \(t\), and \(G_{N}\) is the “normal” growth rate, which we assume constant over time. We can rewrite the above equation in more familiar notation of the multiple regresion model by denoting the change in unemployment \(U_{t}-U_{t-1}\) as \(DU_{t}=\Delta_{t}=U_{t}-U_{t-1}\). We then set \(\gamma = \beta_{0}\) and \(G_{N} = \alpha\). Including an error term to our equation yields

\[ \begin{aligned} DU_{t}=\alpha + \beta_{0}G_{t} + e_{t} \end{aligned} \]

Acknowledging the changes in output are likely to have a distributed-lag effect on unemployment - not all of the effect will take place instantenously. We can then further expand our equation to

\[ \begin{aligned} DU_{t} = \alpha + \beta_{0}G_{t} + \beta_{1}G_{t-1} + \beta_{2}G_{t-2} + \ldots + \beta_{q}G_{t-q} + e_{t} \end{aligned} \]

Our initial data set looks something like this

g	u
1.4	7.3
2.0	7.2
1.4	7.0
1.5	7.0
0.9	7.2
1.5	7.0
1.2	6.8
1.5	6.6
1.6	6.3
1.7	6.0

where the variable g represents growth and the variable u represents the corresponding change in unemployment from the previous quarter.

In order to take advantage of the \(\text{dynlm}\) package, we need to convert our variables to time series objects. We can simply do this in R by typing the following

okun$g <- ts(okun$g, start=c(1985,2),  frequency=4) # The data is quarterly and the earliest observation in our data set is 1985 Q2.  
okun$u <- ts(okun$u, start=c(1985, 2), frequency=4)

To check if we have properly converted our data to time series objects, we can check the output of the growth variable g.

##      Qtr1 Qtr2 Qtr3 Qtr4
## 1985       1.4  2.0  1.4
## 1986  1.5  0.9  1.5  1.2
## 1987  1.5  1.6  1.7  2.5
## 1988  1.3  2.2  1.7  2.1
## 1989  2.1  1.7  1.5  0.9
## 1990  2.3  1.6  0.9 -0.1
## 1991  0.6  1.4  1.2  1.0
## 1992  1.6  1.7  1.5  1.6
## 1993  0.8  1.2  1.0  1.9
## 1994  1.5  1.9  1.2  1.6
## 1995  0.8  0.7  1.3  1.2
## 1996  1.3  2.1  1.2  1.7
## 1997  1.4  1.7  1.6  1.1
## 1998  1.1  1.1  1.7  2.0
## 1999  1.3  1.1  1.6  2.2
## 2000  1.1  2.5  0.7  1.1
## 2001  0.3  1.3  0.0  0.7
## 2002  1.2  1.0  0.9  0.6
## 2003  1.1  1.1  2.2  1.4
## 2004  1.6  1.6  1.5  1.6
## 2005  1.9  1.1  1.8  1.4
## 2006  2.1  1.2  0.8  1.2
## 2007  1.4  1.5  1.3  1.1
## 2008  0.3  0.9  0.3 -1.4
## 2009 -1.2 -0.2  0.8

You can see that time range for our data set is between Q2 of 1985 through Q3 of 2009. Before we proceed, let’s take a preliminary look at growth and unemployment during this time period. Growth during this period

ts.plot(okun$g, col='blue', ylab='Growth')

Unemployment rate during this time period

ts.plot(okun$u, col='red', ylab='∆ Unemployment Rate')

Now we can harness the power of the \(\text{dynlm}\) package. But before we proceed, there’s a couple things to note about thte \(\text{dynlm}\) package. The main function used to estimate our model is the \(\text{dynlm}\) function. Within this function, \(\text{d()}\) can be used to specify the difference in a variable and \(\text{L()}\) can be used to compute the desired lag of the variable. By setting \(\text{L(g, 0:3)}\), for example, we are specififying the coefficients for growth rate of the current period and the past three periods. Setting \(\text{d(u, 1)}\) means we wish to calculate the difference in unemployment rate from the previous one period. We’ll set the growth lag to 2 and 3 and compare the results.

library(dynlm)
okun.lag3 <- dynlm(d(u, 1) ~ L(g, 0:3), data=okun)
okun.lag2 <- dynlm(d(u, 1) ~ L(g, 0:2), data=okun)

Model estimates for lag length at \(q=3\)

## 
## Time series regression with "ts" data:
## Start = 1986(1), End = 2009(3)
## 
## Call:
## dynlm(formula = d(u, 1) ~ L(g, 0:3), data = okun)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.42952 -0.14294  0.01658  0.11120  0.42574 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.580975   0.053889  10.781  < 2e-16 ***
## L(g, 0:3)0  -0.202053   0.033013  -6.120 2.39e-08 ***
## L(g, 0:3)1  -0.164535   0.035818  -4.594 1.41e-05 ***
## L(g, 0:3)2  -0.071556   0.035304  -2.027   0.0456 *  
## L(g, 0:3)3   0.003303   0.036260   0.091   0.9276    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1743 on 90 degrees of freedom
## Multiple R-squared:  0.6524, Adjusted R-squared:  0.637 
## F-statistic: 42.23 on 4 and 90 DF,  p-value: < 2.2e-16

Model estimates for lag length at \(q=2\)

## 
## Time series regression with "ts" data:
## Start = 1985(4), End = 2009(3)
## 
## Call:
## dynlm(formula = d(u, 1) ~ L(g, 0:2), data = okun)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.42713 -0.13869  0.01658  0.11170  0.42382 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.58356    0.04721  12.360  < 2e-16 ***
## L(g, 0:2)0  -0.20202    0.03238  -6.238 1.33e-08 ***
## L(g, 0:2)1  -0.16533    0.03354  -4.930 3.63e-06 ***
## L(g, 0:2)2  -0.07001    0.03310  -2.115   0.0371 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1726 on 92 degrees of freedom
## Multiple R-squared:  0.6539, Adjusted R-squared:  0.6427 
## F-statistic: 57.95 on 3 and 92 DF,  p-value: < 2.2e-16

So, how can we interpret these results? Looking at lag \(q=2\), a 1% increase in the growth rate leads to a fall in the unemployment rate of 0.2% in the current quarter, a fall of 0.165% in the next quarter, and a fall of 0.07% in the third quarter. The changes represent the values of the impact multiplier and the one-quarter and two-quarter delay multipliers.

Serial Correlation

The effect of a change in the value fo an explanatory variable is distributed over a number of future periods. If the equation errors are uncorrelated with each other and with the indepenedent variable \(x\), the traditional least squares estimator and associated testing procedures can be used. With time series data, successive observations are likely to be correlated, thus violating this assumption. When a variable exhibits correlation over time, it is considered to be autocorrelated or serially correlated. Autocorrelation in the error can arise from an autocorrelated omitted variable, or it can arise if a dependent variable \(y\) is autocorrelated and this autocorrelation is not adequately explained by the \(x\)’s and their lags that are included in the equation.

Computing Autocorrelations

The autocorrelation between a variable \(x\) and its values from the previous period \(x_{t-1}\) can be given by

\[ \begin{aligned} \rho_{1} = \frac{\text{cov}(x_{t}, x_{t-1})}{\sqrt{\text{var}(x_{t})\text{var}(x_{t-1})}} = \frac{\text{cov}(x_{t}, x_{t-1})}{\text{var}(x_{t})} \end{aligned} \]

The notation \(\rho_{1}\) is used to denote the population correlation between observations that are one period apart in time, known also as the population autocorrelation of order one.

The first-order sample autocorrelatoin for \(x\) is obtained from the above equation by replacing \(\text{cov}(x_{t}, x_{t-1})\) and \(\text{var}(x_{t})\) by their estimates

\[ \begin{aligned} \widehat{\text{cov}(x_{t}, x_{t-1})} = \frac{1}{T-1}\sum_{t=2}^{T}{(x_{t}-\bar{x})(x_{t-1}-\bar{x})} && \widehat{\text{var}(x_{t})} = \frac{1}{T-1}\sum_{t=1}^{T}{(x_{t}-\bar{x})^{2}} \end{aligned} \]

The index of summation in the formula for \(\widehat{\text{cov}(x_{t}, x_{t-1})}\) starts at \(t=2\) because we do not observe \(x_{0}\). Making the substitutions, and using \(r_{1}\) to denote the sample autocorrelation yields

\[ \begin{aligned} r_{1} = \frac{\sum_{t=2}^{T}(x_{t}-x_{t-1})(x_{t-1}-\bar{x})}{\sum_{t=1}^T (x_{t}-x_{t-1})^{2}} \end{aligned} \]

More generally, the \(k\)-th order sample autocorrelation for a series \(y\) that gives the correlation between observations that are \(k\) periods apart (i.e. the correlation between \(y_{t}\) and \(y_{t-k}\)) is given by

\[ \begin{aligned} r_{k} = \frac{\sum_{t=k+1}^{T} (y_{t}-\bar{y})(y_{t-k}-\bar{y})}{\sum_{t=1}^{T} (y_{t}-\bar{y})^{2}} \end{aligned} \]

Computing the sample autocorrelations of a variable are fairly straightforward to do in R. Simply use the \(\text{acf()}\) function and pass in a time series object. Using our Okun’s Law data, the autocorrelations for g are

acf(okun$g, type='correlation', plot=FALSE)$acf

## , , 1
## 
##               [,1]
##  [1,]  1.000000000
##  [2,]  0.494256756
##  [3,]  0.410707303
##  [4,]  0.154420500
##  [5,]  0.200437884
##  [6,]  0.090385381
##  [7,]  0.024471113
##  [8,] -0.030084345
##  [9,] -0.082319780
## [10,]  0.044106611
## [11,] -0.021284834
## [12,] -0.086834634
## [13,] -0.204043262
## [14,] -0.144183665
## [15,] -0.069486482
## [16,] -0.064020474
## [17,] -0.081797420
## [18,] -0.084794592
## [19,] -0.076679285
## [20,]  0.001031665

This returns an array of the sample autocorrelations of g. You can ignore the first number because of course \(g_{t}\) is going to be perfectly autocorrelated with itself. Thus, our sample autocorrelations for g for \(k=4\) periods are

\[ \begin{aligned} r_{1} = 0.494 \quad r_{2} = 0.411 \quad r_{3} = 0.154 \quad r_{4} = 0.2 \end{aligned} \]

In order to test whether an autocorrelation is statistically significant from 0, let the \(k\)-th order population autocorrelation be denoted by \(\rho_{k}\). When the null hypothesis \(H_{0}:\rho_{k}=0\) is true, it turns out that \(r_{k}\) has an approximate normal distribution with zero mean and variance \(1/T\). Thus, a suitable test can be given by

\[ \begin{aligned} Z = \frac{r_{k}-0}{\sqrt{1/T}} = \sqrt{T}r_{k} \sim N(0,1) \end{aligned} \]

We reject the null if the test statistic \(Z_{k}\) is \(\geq\) 1.96 or \(Z_{k}\) \(\leq\) -1.96. Given \(T=98\) observations in our Okun’s Law data set, the test statistics for the first four lags are

\[ \begin{aligned} Z_{1} = \sqrt{98}\times0.494=4.89, && Z_{2}=\sqrt{98}\times0.414=4.10 \\ Z_{3} = \sqrt{98}\times0.154=1.52, && Z_{4}=\sqrt{98}\times0.200 =1.98 \end{aligned} \]

A useful tool for assessing the significance of autocorrelations is a diagrammatic representation of the correlogram. The correlogram, or sample autocorrelation function, shows the observations that are one period apart, two periods apart, and so on. We can easily construct a correlogram using the same \(\text{acf()}\) we used previously to calculate our correlations.

acf(okun$g, type='correlation')

What’s nice about R’s correlogram is that it shows the bands at the 0.2 and -0.2 threshold, making it relatively easier to determine whether a particular lag is statistically significant from 0. Recall that we reject the null hypothesis that \(\rho_{k}=0\) if \(Z_{k}\) is \(\geq 1.96\) or \(Z_{k} \leq -1.96\). Becareful how you interpret the above correlogram. The first line is to be ignored. The first lag begins with the second line. Looking at the correlogram, it looks like a lag length of 3 is not statistically significant from 0, which we’ve already computed (\(Z_{k}=1.52\)).

Serially Correlated Errors: The Phillips Curve Example

The correlogram can also be used to check whether the multiple regression assumption \(\text{cov}(e_{t}, e_{s})\) for \(t \neq s\) is violated. To do so we can use the Phillips Curve as an example. The Phillips Curve is an empirical model that describes the relationship between inflation and unemployment and was named after A.W. Phillips, the economist who discovered this relationship. Mathematically, the relationship between unemployment and inflation can be expressed as

\[ \begin{aligned} INF_{t} = INF_{t}^{E}-\gamma(U_{t}-U_{t-1}) \end{aligned} \]

where \(INF_{t}\) is the inflation rate during period \(t\), \(INF_{t}^{E}\) denotes inflationary expectations during period \(t\), and \(DU_{t}=U_{t}-U_{t-1}\). We can rewrite the above equation using more familiar regression terms as

\[ \begin{aligned} INF_{t} = \beta_{1} + \beta_{2}DU_{t} + e_{t} \end{aligned} \]

To examine whether the errors in the above equation are correlated, we compute the least squares residuals

\[ \begin{aligned} \hat{e}_{t} = INF_{t} - b_{1} - b_{2}DU_{t} \end{aligned} \]

Here’s what our Phillips Curve data we will be using as an example

inf	u
2.0	8.1
1.5	8.0
1.7	7.8
1.8	7.7
1.8	7.3
1.7	7.3
1.9	6.7
2.0	6.5
1.0	6.3
2.5	6.0

Again, we convert each variable as a time series object in order to fit our model using the \(\text{dynlm()}\) function.

phillips$inf <- ts(phillips$inf, start=c(1987, 2), frequency=4)
phillips$u <- ts(phillips$u, start=c(1987, 2), frequency=4)

# Fitting our model
phillips.fit <- dynlm(inf ~ d(u, 1), data=phillips)

Our results are as follows

## 
## Time series regression with "ts" data:
## Start = 1987(3), End = 2009(4)
## 
## Call:
## dynlm(formula = inf ~ d(u, 1), data = phillips)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.2304 -0.4248 -0.1040  0.3154  2.1391 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.77762    0.06582  11.813   <2e-16 ***
## d(u, 1)     -0.52786    0.22940  -2.301   0.0238 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.622 on 88 degrees of freedom
## Multiple R-squared:  0.05675,    Adjusted R-squared:  0.04603 
## F-statistic: 5.295 on 1 and 88 DF,  p-value: 0.02375

Our fitted model suggests that a 1% increase in unemployment from the previous quarter leads to a -0.52% decrease in inflation.

We can then compute the correlogram of our least squares residuals as follows

acf(phillips.fit$residuals, type='correlation')

Based on the correlogram, you can see that there seems to be strong evidence to suggest serial correlation between the least squares residuals for the first seven lags. Let’s take a look at the actual numbers.

acf(phillips.fit$residuals, type='correlation', plot=FALSE)$acf

## , , 1
## 
##               [,1]
##  [1,]  1.000000000
##  [2,]  0.548658640
##  [3,]  0.455732478
##  [4,]  0.433215785
##  [5,]  0.420493577
##  [6,]  0.339034189
##  [7,]  0.270973440
##  [8,]  0.191220798
##  [9,]  0.250694003
## [10,]  0.153408646
## [11,]  0.050001526
## [12,] -0.015736941
## [13,] -0.013188053
## [14,] -0.008562308
## [15,] -0.003500554
## [16,] -0.128101749
## [17,] -0.128374909
## [18,]  0.009378242
## [19,] -0.015534582
## [20,]  0.012297177

The first 9 lags exhibit serial correlation. That is, \(r_{k}\) for the first 9 lags are not statistically significant from \(0\).

Estimation with Serailly Correlated Errors

Obviously the presence of autocorrelation can lead to misleading results as they violate the assumptions of least squares. However, in teh presence of correlated errors, we can still proceed to fit a model makes up for these violations. In the presence of serial correlation between the error term, three estimators can be used.

Least squares estimation.
An estimation procedure that is relevant when the errors are assumed to follow what is known as a first-order autoregressive model \(e_{t}=\rho e_{t-1} + v_{t}\).
A general estimation strategy for estimating models with serially correlated errors.

Before considering each of the above estimation procedures, it is necessary to introduce an extra assumption. We will encounter models with a lagged dependent variable, such as

\[ \begin{aligned} y_{t} = \delta + \theta y_{t-1} + \delta_{0}x_{t} + \delta_{1}x_{t-1} + v_{t} \end{aligned} \]

In such cases, the time series assumption that the error term is independent of current, past, and future values of \(x\) is no longer valid. In the above equation, this assumption says that \(v_{t}\) is not correlated with \(y_{t-1}\) and \(x_{t-1}\). Since \(y_{t}\) is the future value of \(y_{t-1}\) and since \(y_{t}\) depends on \(v_{t}\), the assumption will be violated. However, we can replace it with a weaker, more tenable assumption that \(v_{t}\) is uncorrelated with the current and past values of the right-hand side variables. Under this assumption, the least squares estimator is no longer unbiased, but it does have the desirable large sample property of consistency, and if the errors are normally distributed, it is best in a large sample sense. It is also important to note that \(v_{t}\) is assumed to be uncorrelated random errors with zero mean and constant variance.

In such cases, the time series assumption that the error term is independent of current, past, and future values of \(x\) is no longer valid. In the above equation, this assumption says that \(v_{t}\) is not correlated with \(y_{t-1}\) and \(x_{t-1}\). Since \(y_{t}\) is the future value of \(y_{t-1}\) and since \(y_{t}\) depends on \(v_{t}\), the assumption will be violated. However, we can replace it with a weaker, more tenable assumption that \(v_{t}\) is uncorrelated with the current and past values of the right-hand side variables. Under this assumption, the least squares estimator is no longer unbiased, but it does have the desirable large sample property of consistency, and if the errors are normally distributed, it is best in a large sample sense. It’s also important to note that \(v_{t}\) is assumed to be uncorrelated random errors with zero mean and constant variance.

Least Squares Estimation

In the presence of serially correlated errors, the consequences of least squares estimation are similar to the consequences of ignoring the presence of heteroskedasticity, namely

The least squares estimator is still a linear unbiased estimator, but is no longer best.
The formulas for the standard errors usually computed for the least square estimator are no longer correct. Although the usual least squares standard errors are not the correct ones, it is possible to compute correct standard errors for the least squares estimator when the errors are serially correlated. These standard errors are known as HAC (heteroskedasticity and autocorrelation consistent) standard errors, or Newey-West standard errors, and they are analogous to heteroskedasticity consistent, or White, standard errors.

By using HAC standard errors with least squares, we can avoid having to specify the precise nature of the autocorrelated error model that is required for an alternative estimator with a lower variance. For HAC standard errors to be valid, we need to assume that the autocorrelations go to zero as the tiem between observations increases (a condition necessary for stationarity), and we need a large sample, but it is not necessary to make a precise assumption about the autocorrelated error model.

To get a feel for how HAC standard errors are found, consider the simple regression model \(y_{t}=\beta_{1}+\beta_{2}x_{t}+e_{t}\). The variance of the least squares estimator \(b_{2}\) can be written as

\[ \begin{aligned} \text{var}(b_{2}) & = \sum_{t}{w_{t}^{2}\text{var}(e_{t})} + \sum_{t \neq s}{\sum{w_{t}w_{s}\text{cov}(e_{t}, e_{s})}} \\ & = \sum_{t}{w_{t}^{2}\text{var}(e_{t})} \Bigg[1+\frac{\sum_{t \neq s}{\sum{w_{t}w_{s}\text{cov}(e_{t}, e_{s})}}}{\sum_{t}{w_{t}^{2}\text{var}(e_{t})}}\Bigg] \end{aligned} \]

where \(w_{t}=(x_{t}-\bar{x})/\sum_{t}{(x_{t}-x_{t-1})^{2}}\). When the errors are not correlated, \(\text{cov}(e_{t},e_{t-1})=0\), and the term in the square brackets is equal to one. The resulting expression \(\text{var}(b_{2})=\sum_{t}{w_{t}^{2}}\text{var}(e_{t})\) is the on used to find HC standard errors. If we denote the quantity in the square brackets in the above equation as \(g\) and denote its estimate as \(\hat{g}\), then the relationship between the two estimated variances is

\[ \begin{aligned} \widehat{\text{var}_{HAC}(b_{2})}=\widehat{\text{var}_{HC}(b_{2})\times \hat{g}} \end{aligned} \]

The HAC variance estimate is equal to the HC variance estimate multiplied by an extra term that depends on the serial correlation in errors.

Calculating HAC standard errors in R requires the \(\text{lmtest}\) and \(\text{sandwich}\) packages. You can compute the HAC standard errors as follows

library(lmtest)
library(sandwich)

coeftest(phillips.fit, vcov=vcovHAC(phillips.fit))

## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)  0.777621   0.094854  8.1980 1.824e-12 ***
## d(u, 1)     -0.527864   0.304444 -1.7339   0.08644 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s compare the least squares standard errors to the HAC standard errors

\[ \begin{aligned} \widehat{INF}= & 0.7776 & - 0.5279DU \\ & (0.0658) & (0.2294) && & (\text{incorrect se}) \\ & (0.0949) & (0.3044) && & (\text{HAC se}) \end{aligned} \]

The HAC standard errors are larger than those from the least squares, implying that if we ignore the autocorrelation, we will overstate the reliability of the least squares estimates. The critical \(t\) and \(p\) values for testing against the null hypothesis \(H_{0}:\beta_{2}=0\) are

\[ \begin{aligned} & t = -.2301 && & p = 0.0238 && & \text{(from LS standard errors)} \\ & t = -1.734 && & p = 0.0865 && & \text{(from HAC standard errors)} \end{aligned} \]

Estimating an AR(1) Error Model

Using least squares with HAC standard errors overcomes the negative consequences that the autocorrelated errors have or least squares standard errors, but it does not address the issue of finding an estimator that is better, in the sense that it has a lower variance. One way to proceed is to make an assumption about the model that generates autocorrelated errors, and to derive an estimator coompatible with this assumption. To introduce such a model, we use the Lagrange multiplier test for serially correlated errors, where correlation between \(e_{t}\) and \(e_{t-1}\) is modeled by writing \(e_{t}\) as dependent on \(e_{t-1}\) through the equation

\[ \begin{aligned} e_{t} = \rho e_{t-1} + v_{t} \end{aligned} \]

If we assume that the \(v_{t}\)’s are uncorrelated random errors with zero mean and constant variance, then our assumptions are

\[ \begin{aligned} E(v_{t})=0 \quad \text{var}(v_{t})=\sigma_{v}^{2} \quad \text{cov}(v_{t}, v_{s})=0 \quad \text{for} & t \neq s \end{aligned} \]

With these assumptions, the model \(e_{t}=\rho e_{t-1} + v_{t}\) describes a first-order autoregressive model. The term AR(1) model is used as an abbreviation. It is called an autoregressive model because it can be viewed as a regression model where \(e_{t}\) depends on its lagged value, iducing autocorrelation. It is called first-order because the right-hand side variable is \(e_{t}\) lagged one period (i.e. \(e_{t-1}\)).

One way to estimate a regression model with serially correlated errors is to assume that those errors follow an AR(1) model and develop an estimation procedure relevant to that model.

Properties of an AR(1) Error Model

One additional assumption we need to make about our AR(1) model to ensure that \(e_{t}\) is stationary is that \(\rho\) is lss than one in absolute value. That is,

\[ \begin{aligned} -1 < \rho < 1 \end{aligned} \]

The mean and variance of \(e_{t}\) are

\[ \begin{aligned} E(e_{t})=0 \qquad \text{var}(e_{t})=\sigma_{e}^{2} = \frac{\sigma_{v}^{2}}{1-\rho^{2}} \end{aligned} \]

The AR(1) error \(e_{t}\) has zero mean and a variance that depends on the variance of \(v_{t}\) and the magnitude of \(\rho\). The larger the degree of autocorrelation (i.e. the closer \(\rho\) is to \(+1\) or \(-1\)), the larger the variance of \(e_{t}\). Additionally, since \(\sigma_{v}^{2}/(1-\rho^{2})\) is constant over time, \(e_{t}\) is homoskedastic. The covariance between the two errors that are \(k\) periods apart is

\[ \begin{aligned} \text{cov}(e_{t}, e_{t-1})=\frac{\rho^{k}\sigma_{v}^{2}}{1-\rho^{2}} \end{aligned} \]

Using the previous correlation formula \(\rho_{1} = \frac{\text{cov}(x_{t}, x_{t-1})}{\sqrt{\text{var}(x_{t})\text{var}(x_{t-1})}} = \frac{\text{cov}(x_{t}, x_{t-1})}{\text{var}(x_{t})}\), which we covered earlier, we can obtain the following equation

\[ \begin{aligned} \rho_{k} & =\text{corr}(e_{t}, e_{t-k})=\frac{\text{cov}(e_{t}, e_{t-k})}{\sqrt{\text{var}(e_{t})\text{var}(e_{t-k})}} = \frac{\text{cov}(e_{t}, e_{t-l})}{\text{var}(e_{t})} \\ & = \frac{\rho^{k}\sigma_{v}^{2}/(1-\rho^{2})}{\sigma_{v}^{2}/(1-\rho^{2})} \\ & = \rho^{k} \end{aligned} \]

The above formula states that \(\rho_{k}=\rho^{k}\). The correlation between two errors that are \(k\) periods apart (\(\rho_{k})\) is given by \(\rho\) raised to the power of \(k\). An interpretation of the unknown parameter \(\rho\) can be obtained by setting \(k=1\). Specifically

\[ \begin{aligned} \rho_{1} = \text{corr}(e_{t}, e_{t-1})=\rho \end{aligned} \]

We can relate the results to the errors from our Phillips curve example. Our resulting correlations for the first 5 lags were

## [1] 0.5486586 0.4557325 0.4332158 0.4204936 0.3390342

Now suppose the errors follow an AR(1) model where we have only one unknown parameter \(\rho\). In this case,

\[ \begin{aligned} \hat{\rho_{1}} = \hat{\rho} = r_{1} = 0.549 \end{aligned} \]

Imposing the structure of the AR(1) error model leads to the following estimates at longer lags

phillips.acf <- acf(phillips.fit$residuals, type='correlation', plot=FALSE)$acf

for (i in 2:6){
  print(phillips.acf[2]^i)
}

## [1] 0.3010263
## [1] 0.1651607
## [1] 0.09061684
## [1] 0.04971771
## [1] 0.02727805

That is,

\[ \begin{aligned} & \hat{\rho}_{2}=\hat{\rho}^{2}=(0.549)^{2}=.301 \\ & \hat{\rho}_{3}=\hat{\rho}^{3}=(0.549)^{3}=.165 \\ & \hat{\rho}_{4}=\hat{\rho}^{4}=(0.549)^{4}=.091 \\ & \hat{\rho}_{5}=\hat{\rho}^{5}=(0.549)^{5}=.05 \\ & \hat{\rho}_{6}=\hat{\rho}^{6}=(0.549)^{6}= 0.027 \end{aligned} \]

Notice that the values are considerably smaller than the unrestricted estimates of the correlogram, suggesting that the AR(1) assumption might not be adequate.

Nonlinear Least Squares Estimation

In this section, we develop an estimator for the regression model with AR(1) errors. First let us summarize the model and its assumptions

\[ \begin{aligned} y_{t} = \beta_{1} + \beta_{2}x_{t} + e_{t} \quad \text{with} e_{t}=\rho e_{t-1} + v_{t} \end{aligned} \]

The \(v_{t}\)’s are uncorrelated random variables with zero mean and constant variance \(\sigma_{v}^{2}\).

\[ \begin{aligned} E(v_{t})=0 \qquad \text{var}(v_{t})=\sigma_{v}^{2} \qquad \text{cov}(v_{t},v_{s})=0 \quad \text{for} t \neq s \end{aligned} \]

Substituting \(e_{t}=\rho e_{t-1} + v_{t}\) into the equation \(y_{t} = \beta_{1} + \beta_{2}x_{t} + e_{t} \quad \text{with} e_{t}=\rho e_{t-1} + v_{t}\) yields

\[ \begin{aligned} y_{t} = \beta_{1} + \beta_{2}x_{t} + \rho e_{t-1} + v_{t} \end{aligned} \]

From the regression equation therror in the previous period can be written as

\[ \begin{aligned} e_{t-1} = y_{t-1} - \beta_{1} - \beta_{2}x_{t-1} \end{aligned} \]

Multiplying the above equation with \(\rho\) gives us

\[ \begin{aligned} \rho e_{t-1} = \rho y_{t-1} - \rho\beta_{1}-\rho\beta_{2}x_{t-1} \end{aligned} \]

Substituting the above equation into \(y_{t} = \beta_{1} + \beta_{2}x_{t} + \rho e_{t-1} + v_{t}\) yields

\[ \begin{aligned} y_{t}=\beta_{1}(1-\rho) + \beta_{2}x_{t} + \rho y_{t-1} - \rho\beta_{2}x_{t-1} + v_{t} \end{aligned} \]

We’ve transformed the original model with the autocorrelated term \(e_{t}\) into a new model given by the above equation that has an error term \(v_{t}\) that is uncorrelated over time.

Estimating a More General Model

Let us first consider the followign equation

\[ \begin{aligned} y_{t} = \delta +\theta y_{t-1} + \delta_{0}x_{t} + \delta_{1}x_{t-1} + v_{t} \end{aligned} \]

In the above equation, there are four unknown parameters, \(\delta\), \(\delta_{0}\), \(\delta_{1}\), and \(\theta_{1}\). The above equation is an autoregressive distributed lag model, which is the main focus of this exercise. As previously mentioned, it is called an autoregressive distributed lag model because the dependent variable is regressed on its lagged value (the autoregressive component), and it also includes explanatory variables and their lagged values (the distributed lag component).

This equation can be estimated by linear least squares providing that the \(v_{t}\) satisfy the usual assumptions required for least squares estimation, namely that they have zero mean and constant variance, and are uncorrelated. The presence of the lagged dependent variable \(y_{t-1}\) means that a large sample is required for the desirable properties of the least squares estimator to hold, but the least squares procedure is still valid providing that \(v_{t}\) is uncorrelated with current and past values of the right-hand side variables. It is crucial that \(v_{t}\) be uncorrelated. If they are, the least square estimator will be biased, even in large sample sizes!

We can apply our equation to the Phillips curve example we’ve been working with and examine the results.

phillips.ardl.fit <- dynlm(inf ~ L(inf, 1) + d(u, 1) + L(d(u, 1), 1), data=phillips)
summary(phillips.ardl.fit)

## 
## Time series regression with "ts" data:
## Start = 1987(4), End = 2009(4)
## 
## Call:
## dynlm(formula = inf ~ L(inf, 1) + d(u, 1) + L(d(u, 1), 1), data = phillips)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.62915 -0.27944 -0.00313  0.21641  1.97590 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     0.3336     0.0899   3.711 0.000368 ***
## L(inf, 1)       0.5593     0.0908   6.160 2.34e-08 ***
## d(u, 1)        -0.6882     0.2499  -2.754 0.007195 ** 
## L(d(u, 1), 1)   0.3200     0.2575   1.243 0.217464    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5221 on 85 degrees of freedom
## Multiple R-squared:  0.3489, Adjusted R-squared:  0.326 
## F-statistic: 15.18 on 3 and 85 DF,  p-value: 5.37e-08

By now, I think you are able to interpret the results. A 1% increase in inflation from the previous quarter leads to a 0.559% increase in inflation, a 1% increase in the change of unemployment rate from the previous quarter leads to a -0.688% decrease in inflation, and a 1% increase in unemployment from the previous two quarters leads to an increase in a 0.32% in inflation (which sounds counterintuitive given that the unemployment rate and inflation are inversely related, but this is not an actual empirical study.)

Autoregressive Distributed Lag Models

As we’ve already mentioned multiple times, an autoregressive distributed lag model (ARDL) is a model that contains both independent variables and their lagged values as well as the lagged values of the dependent variable. In its more general form, with \(p\) lags of \(y\) and \(q\) lags of \(x\), an ARDL(p, q) model can be written as

\[ \begin{aligned} y_{t}=\delta + \theta_{1}y_{t-1}+ \ldots +\theta_{p}y_{t-p}+\delta_{0}x_{t-1}+ \ldots + \delta_{q}x_{t-q} + v_{t} \end{aligned} \]

The ARDL has several advantages. It captures the dynamic effects from the lagged \(x\)’s and the lagged \(y\)’s and by including a sufficient number of lags of \(y\) and \(x\), we can eliminate serial correlation in the errors. Additionally, an ARDL model can be transfomed into one with only lagged \(x\)’s, as the following equation shows

\[ \begin{aligned} y_{t} &= \alpha + \beta_{0}x_{t} + \beta_{1}x_{t-1} + \beta_{2}x_{t-2} + \ldots + e_{t} \\ & = \alpha + \sum_{s=0}^{\infty}{\beta_{s}x_{t-s}+e_{t}} \end{aligned} \]

Because it does not have a finite cut off point, the above equation is called an infinite distributed lag model.

The two main uses for ARDL models are for forecasting and multiplier analysis. If the usual least squares assumptions are valid, we can estimate ARDL models using least squares. The main concern is choosing the appropriate lag lengths for \(p\) and \(q\). Because they all do not lead to the same choice, there is a degree of subjectivity and judgment that must be used. Four possible criteria are

Has serial correlation in the errors been eliminated? If not, then least squares will be biased in small and large samples. This can be checked using a correlogram or Lagrange multiplier tests.
Are the signs and magnitudes of the estimates consistent with our expectations from economic theory?
Are the estimates significantly different from zero, particularly those at the longest lags?
What values for \(p\) and \(q\) minimize information criteria such as the AIC and SC? The AIC can be given by

\[ \begin{aligned} \text{AIC}=\ln{\bigg(\frac{SSE}{T}\bigg)} + \frac{2K}{T} \end{aligned} \]

Another Phillips Curve Example Using an ARDL(1, 0) Model

We’ve actually already estimated a ARDL(1, 0) Phillips curve model, but let’s revisit this fitted model as a starting point. An ARDL(1, 0) Phillips curve model means that our model includes one lagged value of the dependent variable \(inf\) and only the current value of the independent variable \(DU\) (i.e. no lagged values of \(DU\)). Let’s estimate this model once again and examine the results.

phillips.ardl10 <- dynlm(inf ~ L(inf, 1) + d(u, 1), data=phillips)
summary(phillips.ardl10)

## 
## Time series regression with "ts" data:
## Start = 1987(3), End = 2009(4)
## 
## Call:
## dynlm(formula = inf ~ L(inf, 1) + d(u, 1), data = phillips)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.53555 -0.28680 -0.01992  0.29760  2.11713 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.35480    0.08760   4.050 0.000111 ***
## L(inf, 1)    0.52825    0.08508   6.209 1.77e-08 ***
## d(u, 1)     -0.49086    0.19215  -2.555 0.012370 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5207 on 87 degrees of freedom
## Multiple R-squared:  0.3464, Adjusted R-squared:  0.3314 
## F-statistic: 23.05 on 2 and 87 DF,  p-value: 9.252e-09

Given these results, the equation for our fitted model is

\[ \begin{aligned} \widehat{INF_{t}}=0.3548 + 0.528INF_{t-1} + 0.4909DU_{t} \end{aligned} \]

The next step would be to determine whether the errors from our estimated equation are autocorrelated.

acf(phillips.ardl10$residuals, type='correlation')

The correlogram suggests that the autocorrelations of the residuals are not statistically significant from \(0\), suggesting that there is no evidence of serial correlation.

Let’s check the first 5 lags and their correlation coefficients.

acf(phillips.ardl10$residuals, type='correlation', plot=FALSE)$acf[2:6]

## [1] -0.12868631  0.06721028  0.11291913  0.18514891  0.09764152

Now let us check the first five lags and their corresponding z-values to determine whether or not the first five lags are statistically significant from 0. Recall that the formula for calculating the z-value to test against the null hypothesis \(H_{0}:\rho_{k}=0\) is

\[ \begin{aligned} Z=\frac{r_{k}-0}{\sqrt{1/T}}=\sqrt{T} r_{k} \end{aligned} \]

correlations <- acf(phillips.ardl10$residuals, type='correlation', plot=FALSE)$acf[2:6]
z.score <- sqrt(91) * correlations  # We take the square root of 91 because we have T=91 observations in our data set.
df <- data.frame(lag=c(1:5), z.score)
kable(df, format='markdown', align='c')

lag	z.score
1	-1.2275892
2	0.6411452
3	1.0771798
4	1.7662081
5	0.9314407

Because the z-scores for the first values are not enough to reject the null hypothesis, we reject the null hypothesis and conclude that there is no evidence to suggest that autocorrelation is present among the first five lags.

You can try whatever combination of \(p\) and \(q\) for your ARDL model. For example, let’s examine the results of an ARDL(1, 2) model. As you might expect, this model includes one lag of the dependent variable \(inf\) and 2 lags of the independent variable \(u\).

phillips.ardl12 <- dynlm(inf ~ L(inf, 1) + L(d(u, 1), 0:2), data=phillips)
summary(phillips.ardl12)

## 
## Time series regression with "ts" data:
## Start = 1988(1), End = 2009(4)
## 
## Call:
## dynlm(formula = inf ~ L(inf, 1) + L(d(u, 1), 0:2), data = phillips)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.49837 -0.26659 -0.00841  0.23388  1.89088 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       0.34233    0.09016   3.797 0.000278 ***
## L(inf, 1)         0.53673    0.09232   5.814  1.1e-07 ***
## L(d(u, 1), 0:2)0 -0.56569    0.26805  -2.110 0.037832 *  
## L(d(u, 1), 0:2)1  0.42084    0.27687   1.520 0.132316    
## L(d(u, 1), 0:2)2 -0.29927    0.26791  -1.117 0.267205    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5223 on 83 degrees of freedom
## Multiple R-squared:  0.3481, Adjusted R-squared:  0.3167 
## F-statistic: 11.08 on 4 and 83 DF,  p-value: 3.004e-07

The most important thing to note is that there is no one way of choosing the correct levels of \(p\) and \(q\). Obviously there are mathematical techniques that can help you determine this, but intuition can also play a part. Go ahead and play around fitting a model with different levels of \(p\) and \(q\). Sometimes it takes multiple iterations to determine the correct level of \(p\) and \(q\) to produce the best model.

Forecasting

We’ve already established that ARDL models can be useful for both forecasting and strategic analysis (i.e. the effects of independent variables and their previous values on a dependent variable). Unfortunately, as of right now, the \(\text{dynlm}\) package lacks a forecasting option. However, that doesn’t mean you can’t forecast future values because this current functionality does not yet exist. One way of forecasting future values is to do it manually (or since we have the power of R in our hands we can write a function that will do so)

Let us consider using Okun’s Law as an example and forecast the next period’s growth rate. Suppose we have an AR(2) model such that our fitted model is given by the following equation

\[ \begin{equation} G_{t} = \delta + \theta_{1}G_{t-1} + \theta_{2}G_{t-2} + v_{t} \end{equation} \]

The equation for forecasting the next period’s growth rate can be given by the following

\[ \begin{aligned} G_{t+1} = \delta + \theta_{1}G_{t} + \theta_{2}G_{t-1} + v_{t+1} \end{aligned} \]

We won’t actually calculate the future value of \(g\), but because you already know the values of \(G_{t}\) and \(G_{t-1}\) from your fitted model, you can forecast the next period’s growth rate. Additionally, your forecast isn’t just limited to one period. If you wanted to calculate the growth rate for two periods ahead, the equation becomes

\[ \begin{aligned} G_{t+2} = \delta + \theta_{1}G_{t+1} + \theta_{2}G_{t} + v_{t+2} \end{aligned} \]

Again, you already know all of the values needed to forecast the growth rate in the next two periods. You know the value of \(G_{t+1}\) because you presumably calculated earlier. You can potentially do this for however many periods into the future you so wish to choose.