In this exercise, we will go over a time series regression model called the ARDL model. We will take advantage of the \(\text{ARDL}\) library to implement this model and use Okun’s law as an example. The ARDL model is actually two time series regression models combined, so we will briefly cover the Autoregressive (AR) portion of the model, as well as the Distributed Lag component of the model.
Unlike linear regression, which typically deals with cross-sectional data, time series regerssion in two ways:
Given the dynamic relationship of time series data, there are three different ways of modeling these relationships.
\[ \begin{aligned} y_{t} = f(x_{t}, x_{t-1}, x_{t-2}, \ldots) \end{aligned} \]
We can think of \((y_{t}, x_{t})\) as denoting the values of \(y\) and \(x\) in the current period; \(x_{t-1}\) denotes teh value of \(x\) in the previous period, \(x_{t-2}\) denotes the value of x in the previous period, and so on. Because of the lagged effects, the above equation is called a distributed lag (DL) model.
\[ \begin{aligned} y_{t} = y(y_{t-1}, x_{t}) \end{aligned} \]
We can also combine the first two features of the above and previous equation so that we have a dynamic model with lagged values of both the dependent and independent variables, such as
\[ \begin{aligned} y_{t} = f(y_{t-1}, x_{t}, x_{t-1}, x_{t-2}, \ldots) \end{aligned} \]
Such models are called autogregressive distributed lag models (ARDL), which is the model we are interested in implementing. As we have shown, the ARDL model is composed of an autoregressive component, which is the dependent variable regressed on one or more of its past values, and a distributed lag component, which is the independent variable and one or more of its lagged components.
\[ \begin{aligned} y_{t} = f(x_{t}) + e_{t} && e_{t}=g(e_{t-1}) \end{aligned} \]
where the function \(e_{t}=g(e_{t-1})\) is used to denote the dependence of the error term on its value in the previosu period. In this case, \(e_{t}\) is correlated with \(e_{t-1}\), and in such a scenario, we say the errors are serially correlated or autocorrelated.
The least squares assumptions for time series data are
\[ \begin{aligned} \text{cov}(y_{t}, y_{s}) = \text{cov}(e_{t}, e_{s}) = 0 && \text{for} & t \neq s \end{aligned} \]
The dynamic nature of the 2nd, 3rd, and 4th models introduced imply correlation between \(y_{t}\) and \(y_{t-1}\), or \(e_{t}\) and \(e_{t-1}\), and so they clearly violate this assumption.
An assumption that we will maintain throughout this exercise is that variables in our equation are stationary, which means that a variable is one that is not explosive, nor trending, and nor wandering aimlessly without returning to its mean. You can learn more on stationarity here, but a stationary variable simply means a variable whose mean, variance and other statistical properties remain constant over time.
The first dynamic relationship we consider is the first model that we introduced, which took the form of \(y_{t} = f(x_{t}, x_{t-1}, x_{t-2}, \ldots)\), with the additional assumptions that the relationship is linear, and after \(q\) time periods, changes in \(x\) no longer have an impact on \(y\). Under these conditions, we ahve the multiple regression model
\[ \begin{aligned} y_{t} = \alpha + \beta_{0}x_{t} + \beta{1}x_{t-1} + \beta{2}x_{t-2} + \ldots + \beta{q}x_{t-q} + e_{t} \end{aligned} \]
The above model can be treated in the same way as a multiple regression model. Instead of having a number of different explanatory variables, we have a number of different lags of the same explanatory variable. This equation can be very useful in two ways:
\[ \begin{aligned} y_{T+1}=\alpha + \beta_{0}x_{T+1} + \beta_{1}x_{T} + \beta{2}x_{T-1} + \ldots + \beta{q}x_{T-q+1}+e_{T+1} \end{aligned} \]
\[ \begin{aligned} \frac{\partial E(y_{t})}{\partial x_{t-s}} = \frac{\partial E(y_{t+s})}{\partial x_{t}} = \beta_{s} \end{aligned} \]
The effect of a one-unit change in \(x_{t}\) is distributed over the curent and next \(q\) periods. It is called a finite distributed lag model of order \(q\) because it is assumed that after a finite number of periods \(q\), changes in \(x\) no longer have an impact on \(y\). The coefficient \(\beta_{s}\) is called a distributed-lag weight or an s-period delayed multiplier. The coefficient \(\beta_{0}\) is called the impact multiplier. Adding up a portion of the coefficients gives you the interim multipliers. For example, the interm multiplier for two periods would be \((\beta_{0} + \beta_{1} + \beta{2})\). The total multiplier is the final effect on \(y\) on the sustained increase after \(q\) or more periods have elapsed and is given by the equation \(\sum_{s=0}^{q} \beta_{s}\).
In distributed lag models, both \(y\) and \(x\) are typically random. That is, we do not know their values prior to sampling. We do not “set” output growth, for example, and then observe the resulting level of unemployment. To accommodate for this stochastic process, we assume that the \(x\)’s are random and that the error term \(e_{t}\) is independent of all \(x\)’s in the sample - past, current, and future. This assumption, in conjunction with the other multiple regression assumptions, is sufficient for the least squares estimator to be unbiased and to be BLUE, conditional on the \(x\)’s in the sample. The assumptions of the distributed lag model are
At this point, we’ve learned the very basics needed to apply what we have just covered to apply our model to a real world example. We will apply the finite distributed lag model to Okun’s Law. We won’t go into too much detail about what Okun’s Law as it is not within the scope of this excersise, but Arther Melvin Okun was an economist who posited that there was a relationship between the change in unemployment from one period to the next and the rate of growth of output in the economy. Mathetmatically, Okun’s Law can be expressed as
\[ \begin{aligned} U_{t}-U_{t-1}=-\gamma(G_{t}-G_{N}) \end{aligned} \]
where \(U_{t}\) is the unemployment rate in period \(t\), \(G_{t}\) is the growth rate of output in period \(t\), and \(G_{N}\) is the “normal” growth rate, which we assume constant over time. We can rewrite the above equation in more familiar notation of the multiple regresion model by denoting the change in unemployment \(U_{t}-U_{t-1}\) as \(DU_{t}=\Delta_{t}=U_{t}-U_{t-1}\). We then set \(\gamma = \beta_{0}\) and \(G_{N} = \alpha\). Including an error term to our equation yields
\[ \begin{aligned} DU_{t}=\alpha + \beta_{0}G_{t} + e_{t} \end{aligned} \]
Acknowledging the changes in output are likely to have a distributed-lag effect on unemployment - not all of the effect will take place instantenously. We can then further expand our equation to
\[ \begin{aligned} DU_{t} = \alpha + \beta_{0}G_{t} + \beta_{1}G_{t-1} + \beta_{2}G_{t-2} + \ldots + \beta_{q}G_{t-q} + e_{t} \end{aligned} \]
Our initial data set looks something like this
g | u |
---|---|
1.4 | 7.3 |
2.0 | 7.2 |
1.4 | 7.0 |
1.5 | 7.0 |
0.9 | 7.2 |
1.5 | 7.0 |
1.2 | 6.8 |
1.5 | 6.6 |
1.6 | 6.3 |
1.7 | 6.0 |
where the variable g represents growth and the variable u represents the corresponding change in unemployment from the previous quarter.
In order to take advantage of the \(\text{dynlm}\) package, we need to convert our variables to time series objects. We can simply do this in R by typing the following
okun$g <- ts(okun$g, start=c(1985,2), frequency=4) # The data is quarterly and the earliest observation in our data set is 1985 Q2.
okun$u <- ts(okun$u, start=c(1985, 2), frequency=4)
To check if we have properly converted our data to time series objects, we can check the output of the growth variable g.
## Qtr1 Qtr2 Qtr3 Qtr4
## 1985 1.4 2.0 1.4
## 1986 1.5 0.9 1.5 1.2
## 1987 1.5 1.6 1.7 2.5
## 1988 1.3 2.2 1.7 2.1
## 1989 2.1 1.7 1.5 0.9
## 1990 2.3 1.6 0.9 -0.1
## 1991 0.6 1.4 1.2 1.0
## 1992 1.6 1.7 1.5 1.6
## 1993 0.8 1.2 1.0 1.9
## 1994 1.5 1.9 1.2 1.6
## 1995 0.8 0.7 1.3 1.2
## 1996 1.3 2.1 1.2 1.7
## 1997 1.4 1.7 1.6 1.1
## 1998 1.1 1.1 1.7 2.0
## 1999 1.3 1.1 1.6 2.2
## 2000 1.1 2.5 0.7 1.1
## 2001 0.3 1.3 0.0 0.7
## 2002 1.2 1.0 0.9 0.6
## 2003 1.1 1.1 2.2 1.4
## 2004 1.6 1.6 1.5 1.6
## 2005 1.9 1.1 1.8 1.4
## 2006 2.1 1.2 0.8 1.2
## 2007 1.4 1.5 1.3 1.1
## 2008 0.3 0.9 0.3 -1.4
## 2009 -1.2 -0.2 0.8
You can see that time range for our data set is between Q2 of 1985 through Q3 of 2009. Before we proceed, let’s take a preliminary look at growth and unemployment during this time period. Growth during this period
ts.plot(okun$g, col='blue', ylab='Growth')
Unemployment rate during this time period
ts.plot(okun$u, col='red', ylab='∆ Unemployment Rate')
Now we can harness the power of the \(\text{dynlm}\) package. But before we proceed, there’s a couple things to note about thte \(\text{dynlm}\) package. The main function used to estimate our model is the \(\text{dynlm}\) function. Within this function, \(\text{d()}\) can be used to specify the difference in a variable and \(\text{L()}\) can be used to compute the desired lag of the variable. By setting \(\text{L(g, 0:3)}\), for example, we are specififying the coefficients for growth rate of the current period and the past three periods. Setting \(\text{d(u, 1)}\) means we wish to calculate the difference in unemployment rate from the previous one period. We’ll set the growth lag to 2 and 3 and compare the results.
library(dynlm)
okun.lag3 <- dynlm(d(u, 1) ~ L(g, 0:3), data=okun)
okun.lag2 <- dynlm(d(u, 1) ~ L(g, 0:2), data=okun)
Model estimates for lag length at \(q=3\)
##
## Time series regression with "ts" data:
## Start = 1986(1), End = 2009(3)
##
## Call:
## dynlm(formula = d(u, 1) ~ L(g, 0:3), data = okun)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.42952 -0.14294 0.01658 0.11120 0.42574
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.580975 0.053889 10.781 < 2e-16 ***
## L(g, 0:3)0 -0.202053 0.033013 -6.120 2.39e-08 ***
## L(g, 0:3)1 -0.164535 0.035818 -4.594 1.41e-05 ***
## L(g, 0:3)2 -0.071556 0.035304 -2.027 0.0456 *
## L(g, 0:3)3 0.003303 0.036260 0.091 0.9276
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1743 on 90 degrees of freedom
## Multiple R-squared: 0.6524, Adjusted R-squared: 0.637
## F-statistic: 42.23 on 4 and 90 DF, p-value: < 2.2e-16
Model estimates for lag length at \(q=2\)
##
## Time series regression with "ts" data:
## Start = 1985(4), End = 2009(3)
##
## Call:
## dynlm(formula = d(u, 1) ~ L(g, 0:2), data = okun)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.42713 -0.13869 0.01658 0.11170 0.42382
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.58356 0.04721 12.360 < 2e-16 ***
## L(g, 0:2)0 -0.20202 0.03238 -6.238 1.33e-08 ***
## L(g, 0:2)1 -0.16533 0.03354 -4.930 3.63e-06 ***
## L(g, 0:2)2 -0.07001 0.03310 -2.115 0.0371 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1726 on 92 degrees of freedom
## Multiple R-squared: 0.6539, Adjusted R-squared: 0.6427
## F-statistic: 57.95 on 3 and 92 DF, p-value: < 2.2e-16
So, how can we interpret these results? Looking at lag \(q=2\), a 1% increase in the growth rate leads to a fall in the unemployment rate of 0.2% in the current quarter, a fall of 0.165% in the next quarter, and a fall of 0.07% in the third quarter. The changes represent the values of the impact multiplier and the one-quarter and two-quarter delay multipliers.
The effect of a change in the value fo an explanatory variable is distributed over a number of future periods. If the equation errors are uncorrelated with each other and with the indepenedent variable \(x\), the traditional least squares estimator and associated testing procedures can be used. With time series data, successive observations are likely to be correlated, thus violating this assumption. When a variable exhibits correlation over time, it is considered to be autocorrelated or serially correlated. Autocorrelation in the error can arise from an autocorrelated omitted variable, or it can arise if a dependent variable \(y\) is autocorrelated and this autocorrelation is not adequately explained by the \(x\)’s and their lags that are included in the equation.
The autocorrelation between a variable \(x\) and its values from the previous period \(x_{t-1}\) can be given by
\[ \begin{aligned} \rho_{1} = \frac{\text{cov}(x_{t}, x_{t-1})}{\sqrt{\text{var}(x_{t})\text{var}(x_{t-1})}} = \frac{\text{cov}(x_{t}, x_{t-1})}{\text{var}(x_{t})} \end{aligned} \]
The notation \(\rho_{1}\) is used to denote the population correlation between observations that are one period apart in time, known also as the population autocorrelation of order one.
The first-order sample autocorrelatoin for \(x\) is obtained from the above equation by replacing \(\text{cov}(x_{t}, x_{t-1})\) and \(\text{var}(x_{t})\) by their estimates
\[ \begin{aligned} \widehat{\text{cov}(x_{t}, x_{t-1})} = \frac{1}{T-1}\sum_{t=2}^{T}{(x_{t}-\bar{x})(x_{t-1}-\bar{x})} && \widehat{\text{var}(x_{t})} = \frac{1}{T-1}\sum_{t=1}^{T}{(x_{t}-\bar{x})^{2}} \end{aligned} \]
The index of summation in the formula for \(\widehat{\text{cov}(x_{t}, x_{t-1})}\) starts at \(t=2\) because we do not observe \(x_{0}\). Making the substitutions, and using \(r_{1}\) to denote the sample autocorrelation yields
\[ \begin{aligned} r_{1} = \frac{\sum_{t=2}^{T}(x_{t}-x_{t-1})(x_{t-1}-\bar{x})}{\sum_{t=1}^T (x_{t}-x_{t-1})^{2}} \end{aligned} \]
More generally, the \(k\)-th order sample autocorrelation for a series \(y\) that gives the correlation between observations that are \(k\) periods apart (i.e. the correlation between \(y_{t}\) and \(y_{t-k}\)) is given by
\[ \begin{aligned} r_{k} = \frac{\sum_{t=k+1}^{T} (y_{t}-\bar{y})(y_{t-k}-\bar{y})}{\sum_{t=1}^{T} (y_{t}-\bar{y})^{2}} \end{aligned} \]
Computing the sample autocorrelations of a variable are fairly straightforward to do in R. Simply use the \(\text{acf()}\) function and pass in a time series object. Using our Okun’s Law data, the autocorrelations for g are
acf(okun$g, type='correlation', plot=FALSE)$acf
## , , 1
##
## [,1]
## [1,] 1.000000000
## [2,] 0.494256756
## [3,] 0.410707303
## [4,] 0.154420500
## [5,] 0.200437884
## [6,] 0.090385381
## [7,] 0.024471113
## [8,] -0.030084345
## [9,] -0.082319780
## [10,] 0.044106611
## [11,] -0.021284834
## [12,] -0.086834634
## [13,] -0.204043262
## [14,] -0.144183665
## [15,] -0.069486482
## [16,] -0.064020474
## [17,] -0.081797420
## [18,] -0.084794592
## [19,] -0.076679285
## [20,] 0.001031665
This returns an array of the sample autocorrelations of g. You can ignore the first number because of course \(g_{t}\) is going to be perfectly autocorrelated with itself. Thus, our sample autocorrelations for g for \(k=4\) periods are
\[ \begin{aligned} r_{1} = 0.494 \quad r_{2} = 0.411 \quad r_{3} = 0.154 \quad r_{4} = 0.2 \end{aligned} \]
In order to test whether an autocorrelation is statistically significant from 0, let the \(k\)-th order population autocorrelation be denoted by \(\rho_{k}\). When the null hypothesis \(H_{0}:\rho_{k}=0\) is true, it turns out that \(r_{k}\) has an approximate normal distribution with zero mean and variance \(1/T\). Thus, a suitable test can be given by
\[ \begin{aligned} Z = \frac{r_{k}-0}{\sqrt{1/T}} = \sqrt{T}r_{k} \sim N(0,1) \end{aligned} \]
We reject the null if the test statistic \(Z_{k}\) is \(\geq\) 1.96 or \(Z_{k}\) \(\leq\) -1.96. Given \(T=98\) observations in our Okun’s Law data set, the test statistics for the first four lags are
\[ \begin{aligned} Z_{1} = \sqrt{98}\times0.494=4.89, && Z_{2}=\sqrt{98}\times0.414=4.10 \\ Z_{3} = \sqrt{98}\times0.154=1.52, && Z_{4}=\sqrt{98}\times0.200 =1.98 \end{aligned} \]
A useful tool for assessing the significance of autocorrelations is a diagrammatic representation of the correlogram. The correlogram, or sample autocorrelation function, shows the observations that are one period apart, two periods apart, and so on. We can easily construct a correlogram using the same \(\text{acf()}\) we used previously to calculate our correlations.
acf(okun$g, type='correlation')
What’s nice about R’s correlogram is that it shows the bands at the 0.2 and -0.2 threshold, making it relatively easier to determine whether a particular lag is statistically significant from 0. Recall that we reject the null hypothesis that \(\rho_{k}=0\) if \(Z_{k}\) is \(\geq 1.96\) or \(Z_{k} \leq -1.96\). Becareful how you interpret the above correlogram. The first line is to be ignored. The first lag begins with the second line. Looking at the correlogram, it looks like a lag length of 3 is not statistically significant from 0, which we’ve already computed (\(Z_{k}=1.52\)).
In the presence of serially correlated errors, the consequences of least squares estimation are similar to the consequences of ignoring the presence of heteroskedasticity, namely
By using HAC standard errors with least squares, we can avoid having to specify the precise nature of the autocorrelated error model that is required for an alternative estimator with a lower variance. For HAC standard errors to be valid, we need to assume that the autocorrelations go to zero as the tiem between observations increases (a condition necessary for stationarity), and we need a large sample, but it is not necessary to make a precise assumption about the autocorrelated error model.
To get a feel for how HAC standard errors are found, consider the simple regression model \(y_{t}=\beta_{1}+\beta_{2}x_{t}+e_{t}\). The variance of the least squares estimator \(b_{2}\) can be written as
\[ \begin{aligned} \text{var}(b_{2}) & = \sum_{t}{w_{t}^{2}\text{var}(e_{t})} + \sum_{t \neq s}{\sum{w_{t}w_{s}\text{cov}(e_{t}, e_{s})}} \\ & = \sum_{t}{w_{t}^{2}\text{var}(e_{t})} \Bigg[1+\frac{\sum_{t \neq s}{\sum{w_{t}w_{s}\text{cov}(e_{t}, e_{s})}}}{\sum_{t}{w_{t}^{2}\text{var}(e_{t})}}\Bigg] \end{aligned} \]
where \(w_{t}=(x_{t}-\bar{x})/\sum_{t}{(x_{t}-x_{t-1})^{2}}\). When the errors are not correlated, \(\text{cov}(e_{t},e_{t-1})=0\), and the term in the square brackets is equal to one. The resulting expression \(\text{var}(b_{2})=\sum_{t}{w_{t}^{2}}\text{var}(e_{t})\) is the on used to find HC standard errors. If we denote the quantity in the square brackets in the above equation as \(g\) and denote its estimate as \(\hat{g}\), then the relationship between the two estimated variances is
\[ \begin{aligned} \widehat{\text{var}_{HAC}(b_{2})}=\widehat{\text{var}_{HC}(b_{2})\times \hat{g}} \end{aligned} \]
The HAC variance estimate is equal to the HC variance estimate multiplied by an extra term that depends on the serial correlation in errors.
Calculating HAC standard errors in R requires the \(\text{lmtest}\) and \(\text{sandwich}\) packages. You can compute the HAC standard errors as follows
library(lmtest)
library(sandwich)
coeftest(phillips.fit, vcov=vcovHAC(phillips.fit))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.777621 0.094854 8.1980 1.824e-12 ***
## d(u, 1) -0.527864 0.304444 -1.7339 0.08644 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Let’s compare the least squares standard errors to the HAC standard errors
\[ \begin{aligned} \widehat{INF}= & 0.7776 & - 0.5279DU \\ & (0.0658) & (0.2294) && & (\text{incorrect se}) \\ & (0.0949) & (0.3044) && & (\text{HAC se}) \end{aligned} \]
The HAC standard errors are larger than those from the least squares, implying that if we ignore the autocorrelation, we will overstate the reliability of the least squares estimates. The critical \(t\) and \(p\) values for testing against the null hypothesis \(H_{0}:\beta_{2}=0\) are
\[ \begin{aligned} & t = -.2301 && & p = 0.0238 && & \text{(from LS standard errors)} \\ & t = -1.734 && & p = 0.0865 && & \text{(from HAC standard errors)} \end{aligned} \]
Using least squares with HAC standard errors overcomes the negative consequences that the autocorrelated errors have or least squares standard errors, but it does not address the issue of finding an estimator that is better, in the sense that it has a lower variance. One way to proceed is to make an assumption about the model that generates autocorrelated errors, and to derive an estimator coompatible with this assumption. To introduce such a model, we use the Lagrange multiplier test for serially correlated errors, where correlation between \(e_{t}\) and \(e_{t-1}\) is modeled by writing \(e_{t}\) as dependent on \(e_{t-1}\) through the equation
\[ \begin{aligned} e_{t} = \rho e_{t-1} + v_{t} \end{aligned} \]
If we assume that the \(v_{t}\)’s are uncorrelated random errors with zero mean and constant variance, then our assumptions are
\[ \begin{aligned} E(v_{t})=0 \quad \text{var}(v_{t})=\sigma_{v}^{2} \quad \text{cov}(v_{t}, v_{s})=0 \quad \text{for} & t \neq s \end{aligned} \]
With these assumptions, the model \(e_{t}=\rho e_{t-1} + v_{t}\) describes a first-order autoregressive model. The term AR(1) model is used as an abbreviation. It is called an autoregressive model because it can be viewed as a regression model where \(e_{t}\) depends on its lagged value, iducing autocorrelation. It is called first-order because the right-hand side variable is \(e_{t}\) lagged one period (i.e. \(e_{t-1}\)).
One way to estimate a regression model with serially correlated errors is to assume that those errors follow an AR(1) model and develop an estimation procedure relevant to that model.
One additional assumption we need to make about our AR(1) model to ensure that \(e_{t}\) is stationary is that \(\rho\) is lss than one in absolute value. That is,
\[ \begin{aligned} -1 < \rho < 1 \end{aligned} \]
The mean and variance of \(e_{t}\) are
\[ \begin{aligned} E(e_{t})=0 \qquad \text{var}(e_{t})=\sigma_{e}^{2} = \frac{\sigma_{v}^{2}}{1-\rho^{2}} \end{aligned} \]
The AR(1) error \(e_{t}\) has zero mean and a variance that depends on the variance of \(v_{t}\) and the magnitude of \(\rho\). The larger the degree of autocorrelation (i.e. the closer \(\rho\) is to \(+1\) or \(-1\)), the larger the variance of \(e_{t}\). Additionally, since \(\sigma_{v}^{2}/(1-\rho^{2})\) is constant over time, \(e_{t}\) is homoskedastic. The covariance between the two errors that are \(k\) periods apart is
\[ \begin{aligned} \text{cov}(e_{t}, e_{t-1})=\frac{\rho^{k}\sigma_{v}^{2}}{1-\rho^{2}} \end{aligned} \]
Using the previous correlation formula \(\rho_{1} = \frac{\text{cov}(x_{t}, x_{t-1})}{\sqrt{\text{var}(x_{t})\text{var}(x_{t-1})}} = \frac{\text{cov}(x_{t}, x_{t-1})}{\text{var}(x_{t})}\), which we covered earlier, we can obtain the following equation
\[ \begin{aligned} \rho_{k} & =\text{corr}(e_{t}, e_{t-k})=\frac{\text{cov}(e_{t}, e_{t-k})}{\sqrt{\text{var}(e_{t})\text{var}(e_{t-k})}} = \frac{\text{cov}(e_{t}, e_{t-l})}{\text{var}(e_{t})} \\ & = \frac{\rho^{k}\sigma_{v}^{2}/(1-\rho^{2})}{\sigma_{v}^{2}/(1-\rho^{2})} \\ & = \rho^{k} \end{aligned} \]
The above formula states that \(\rho_{k}=\rho^{k}\). The correlation between two errors that are \(k\) periods apart (\(\rho_{k})\) is given by \(\rho\) raised to the power of \(k\). An interpretation of the unknown parameter \(\rho\) can be obtained by setting \(k=1\). Specifically
\[ \begin{aligned} \rho_{1} = \text{corr}(e_{t}, e_{t-1})=\rho \end{aligned} \]
We can relate the results to the errors from our Phillips curve example. Our resulting correlations for the first 5 lags were
## [1] 0.5486586 0.4557325 0.4332158 0.4204936 0.3390342
Now suppose the errors follow an AR(1) model where we have only one unknown parameter \(\rho\). In this case,
\[ \begin{aligned} \hat{\rho_{1}} = \hat{\rho} = r_{1} = 0.549 \end{aligned} \]
Imposing the structure of the AR(1) error model leads to the following estimates at longer lags
phillips.acf <- acf(phillips.fit$residuals, type='correlation', plot=FALSE)$acf
for (i in 2:6){
print(phillips.acf[2]^i)
}
## [1] 0.3010263
## [1] 0.1651607
## [1] 0.09061684
## [1] 0.04971771
## [1] 0.02727805
That is,
\[ \begin{aligned} & \hat{\rho}_{2}=\hat{\rho}^{2}=(0.549)^{2}=.301 \\ & \hat{\rho}_{3}=\hat{\rho}^{3}=(0.549)^{3}=.165 \\ & \hat{\rho}_{4}=\hat{\rho}^{4}=(0.549)^{4}=.091 \\ & \hat{\rho}_{5}=\hat{\rho}^{5}=(0.549)^{5}=.05 \\ & \hat{\rho}_{6}=\hat{\rho}^{6}=(0.549)^{6}= 0.027 \end{aligned} \]
Notice that the values are considerably smaller than the unrestricted estimates of the correlogram, suggesting that the AR(1) assumption might not be adequate.
In this section, we develop an estimator for the regression model with AR(1) errors. First let us summarize the model and its assumptions
\[ \begin{aligned} y_{t} = \beta_{1} + \beta_{2}x_{t} + e_{t} \quad \text{with} e_{t}=\rho e_{t-1} + v_{t} \end{aligned} \]
The \(v_{t}\)’s are uncorrelated random variables with zero mean and constant variance \(\sigma_{v}^{2}\).
\[ \begin{aligned} E(v_{t})=0 \qquad \text{var}(v_{t})=\sigma_{v}^{2} \qquad \text{cov}(v_{t},v_{s})=0 \quad \text{for} t \neq s \end{aligned} \]
Substituting \(e_{t}=\rho e_{t-1} + v_{t}\) into the equation \(y_{t} = \beta_{1} + \beta_{2}x_{t} + e_{t} \quad \text{with} e_{t}=\rho e_{t-1} + v_{t}\) yields
\[ \begin{aligned} y_{t} = \beta_{1} + \beta_{2}x_{t} + \rho e_{t-1} + v_{t} \end{aligned} \]
From the regression equation therror in the previous period can be written as
\[ \begin{aligned} e_{t-1} = y_{t-1} - \beta_{1} - \beta_{2}x_{t-1} \end{aligned} \]
Multiplying the above equation with \(\rho\) gives us
\[ \begin{aligned} \rho e_{t-1} = \rho y_{t-1} - \rho\beta_{1}-\rho\beta_{2}x_{t-1} \end{aligned} \]
Substituting the above equation into \(y_{t} = \beta_{1} + \beta_{2}x_{t} + \rho e_{t-1} + v_{t}\) yields
\[ \begin{aligned} y_{t}=\beta_{1}(1-\rho) + \beta_{2}x_{t} + \rho y_{t-1} - \rho\beta_{2}x_{t-1} + v_{t} \end{aligned} \]
We’ve transformed the original model with the autocorrelated term \(e_{t}\) into a new model given by the above equation that has an error term \(v_{t}\) that is uncorrelated over time.
Let us first consider the followign equation
\[ \begin{aligned} y_{t} = \delta +\theta y_{t-1} + \delta_{0}x_{t} + \delta_{1}x_{t-1} + v_{t} \end{aligned} \]
In the above equation, there are four unknown parameters, \(\delta\), \(\delta_{0}\), \(\delta_{1}\), and \(\theta_{1}\). The above equation is an autoregressive distributed lag model, which is the main focus of this exercise. As previously mentioned, it is called an autoregressive distributed lag model because the dependent variable is regressed on its lagged value (the autoregressive component), and it also includes explanatory variables and their lagged values (the distributed lag component).
This equation can be estimated by linear least squares providing that the \(v_{t}\) satisfy the usual assumptions required for least squares estimation, namely that they have zero mean and constant variance, and are uncorrelated. The presence of the lagged dependent variable \(y_{t-1}\) means that a large sample is required for the desirable properties of the least squares estimator to hold, but the least squares procedure is still valid providing that \(v_{t}\) is uncorrelated with current and past values of the right-hand side variables. It is crucial that \(v_{t}\) be uncorrelated. If they are, the least square estimator will be biased, even in large sample sizes!
We can apply our equation to the Phillips curve example we’ve been working with and examine the results.
phillips.ardl.fit <- dynlm(inf ~ L(inf, 1) + d(u, 1) + L(d(u, 1), 1), data=phillips)
summary(phillips.ardl.fit)
##
## Time series regression with "ts" data:
## Start = 1987(4), End = 2009(4)
##
## Call:
## dynlm(formula = inf ~ L(inf, 1) + d(u, 1) + L(d(u, 1), 1), data = phillips)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.62915 -0.27944 -0.00313 0.21641 1.97590
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.3336 0.0899 3.711 0.000368 ***
## L(inf, 1) 0.5593 0.0908 6.160 2.34e-08 ***
## d(u, 1) -0.6882 0.2499 -2.754 0.007195 **
## L(d(u, 1), 1) 0.3200 0.2575 1.243 0.217464
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5221 on 85 degrees of freedom
## Multiple R-squared: 0.3489, Adjusted R-squared: 0.326
## F-statistic: 15.18 on 3 and 85 DF, p-value: 5.37e-08
By now, I think you are able to interpret the results. A 1% increase in inflation from the previous quarter leads to a 0.559% increase in inflation, a 1% increase in the change of unemployment rate from the previous quarter leads to a -0.688% decrease in inflation, and a 1% increase in unemployment from the previous two quarters leads to an increase in a 0.32% in inflation (which sounds counterintuitive given that the unemployment rate and inflation are inversely related, but this is not an actual empirical study.)
As we’ve already mentioned multiple times, an autoregressive distributed lag model (ARDL) is a model that contains both independent variables and their lagged values as well as the lagged values of the dependent variable. In its more general form, with \(p\) lags of \(y\) and \(q\) lags of \(x\), an ARDL(p, q) model can be written as
\[ \begin{aligned} y_{t}=\delta + \theta_{1}y_{t-1}+ \ldots +\theta_{p}y_{t-p}+\delta_{0}x_{t-1}+ \ldots + \delta_{q}x_{t-q} + v_{t} \end{aligned} \]
The ARDL has several advantages. It captures the dynamic effects from the lagged \(x\)’s and the lagged \(y\)’s and by including a sufficient number of lags of \(y\) and \(x\), we can eliminate serial correlation in the errors. Additionally, an ARDL model can be transfomed into one with only lagged \(x\)’s, as the following equation shows
\[ \begin{aligned} y_{t} &= \alpha + \beta_{0}x_{t} + \beta_{1}x_{t-1} + \beta_{2}x_{t-2} + \ldots + e_{t} \\ & = \alpha + \sum_{s=0}^{\infty}{\beta_{s}x_{t-s}+e_{t}} \end{aligned} \]
Because it does not have a finite cut off point, the above equation is called an infinite distributed lag model.
The two main uses for ARDL models are for forecasting and multiplier analysis. If the usual least squares assumptions are valid, we can estimate ARDL models using least squares. The main concern is choosing the appropriate lag lengths for \(p\) and \(q\). Because they all do not lead to the same choice, there is a degree of subjectivity and judgment that must be used. Four possible criteria are
\[ \begin{aligned} \text{AIC}=\ln{\bigg(\frac{SSE}{T}\bigg)} + \frac{2K}{T} \end{aligned} \]
We’ve actually already estimated a ARDL(1, 0) Phillips curve model, but let’s revisit this fitted model as a starting point. An ARDL(1, 0) Phillips curve model means that our model includes one lagged value of the dependent variable \(inf\) and only the current value of the independent variable \(DU\) (i.e. no lagged values of \(DU\)). Let’s estimate this model once again and examine the results.
phillips.ardl10 <- dynlm(inf ~ L(inf, 1) + d(u, 1), data=phillips)
summary(phillips.ardl10)
##
## Time series regression with "ts" data:
## Start = 1987(3), End = 2009(4)
##
## Call:
## dynlm(formula = inf ~ L(inf, 1) + d(u, 1), data = phillips)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.53555 -0.28680 -0.01992 0.29760 2.11713
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.35480 0.08760 4.050 0.000111 ***
## L(inf, 1) 0.52825 0.08508 6.209 1.77e-08 ***
## d(u, 1) -0.49086 0.19215 -2.555 0.012370 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5207 on 87 degrees of freedom
## Multiple R-squared: 0.3464, Adjusted R-squared: 0.3314
## F-statistic: 23.05 on 2 and 87 DF, p-value: 9.252e-09
Given these results, the equation for our fitted model is
\[ \begin{aligned} \widehat{INF_{t}}=0.3548 + 0.528INF_{t-1} + 0.4909DU_{t} \end{aligned} \]
The next step would be to determine whether the errors from our estimated equation are autocorrelated.
acf(phillips.ardl10$residuals, type='correlation')
The correlogram suggests that the autocorrelations of the residuals are not statistically significant from \(0\), suggesting that there is no evidence of serial correlation.
Let’s check the first 5 lags and their correlation coefficients.
acf(phillips.ardl10$residuals, type='correlation', plot=FALSE)$acf[2:6]
## [1] -0.12868631 0.06721028 0.11291913 0.18514891 0.09764152
Now let us check the first five lags and their corresponding z-values to determine whether or not the first five lags are statistically significant from 0. Recall that the formula for calculating the z-value to test against the null hypothesis \(H_{0}:\rho_{k}=0\) is
\[ \begin{aligned} Z=\frac{r_{k}-0}{\sqrt{1/T}}=\sqrt{T} r_{k} \end{aligned} \]
correlations <- acf(phillips.ardl10$residuals, type='correlation', plot=FALSE)$acf[2:6]
z.score <- sqrt(91) * correlations # We take the square root of 91 because we have T=91 observations in our data set.
df <- data.frame(lag=c(1:5), z.score)
kable(df, format='markdown', align='c')
lag | z.score |
---|---|
1 | -1.2275892 |
2 | 0.6411452 |
3 | 1.0771798 |
4 | 1.7662081 |
5 | 0.9314407 |
Because the z-scores for the first values are not enough to reject the null hypothesis, we reject the null hypothesis and conclude that there is no evidence to suggest that autocorrelation is present among the first five lags.
You can try whatever combination of \(p\) and \(q\) for your ARDL model. For example, let’s examine the results of an ARDL(1, 2) model. As you might expect, this model includes one lag of the dependent variable \(inf\) and 2 lags of the independent variable \(u\).
phillips.ardl12 <- dynlm(inf ~ L(inf, 1) + L(d(u, 1), 0:2), data=phillips)
summary(phillips.ardl12)
##
## Time series regression with "ts" data:
## Start = 1988(1), End = 2009(4)
##
## Call:
## dynlm(formula = inf ~ L(inf, 1) + L(d(u, 1), 0:2), data = phillips)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.49837 -0.26659 -0.00841 0.23388 1.89088
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.34233 0.09016 3.797 0.000278 ***
## L(inf, 1) 0.53673 0.09232 5.814 1.1e-07 ***
## L(d(u, 1), 0:2)0 -0.56569 0.26805 -2.110 0.037832 *
## L(d(u, 1), 0:2)1 0.42084 0.27687 1.520 0.132316
## L(d(u, 1), 0:2)2 -0.29927 0.26791 -1.117 0.267205
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5223 on 83 degrees of freedom
## Multiple R-squared: 0.3481, Adjusted R-squared: 0.3167
## F-statistic: 11.08 on 4 and 83 DF, p-value: 3.004e-07
The most important thing to note is that there is no one way of choosing the correct levels of \(p\) and \(q\). Obviously there are mathematical techniques that can help you determine this, but intuition can also play a part. Go ahead and play around fitting a model with different levels of \(p\) and \(q\). Sometimes it takes multiple iterations to determine the correct level of \(p\) and \(q\) to produce the best model.
We’ve already established that ARDL models can be useful for both forecasting and strategic analysis (i.e. the effects of independent variables and their previous values on a dependent variable). Unfortunately, as of right now, the \(\text{dynlm}\) package lacks a forecasting option. However, that doesn’t mean you can’t forecast future values because this current functionality does not yet exist. One way of forecasting future values is to do it manually (or since we have the power of R in our hands we can write a function that will do so)
Let us consider using Okun’s Law as an example and forecast the next period’s growth rate. Suppose we have an AR(2) model such that our fitted model is given by the following equation
\[ \begin{equation} G_{t} = \delta + \theta_{1}G_{t-1} + \theta_{2}G_{t-2} + v_{t} \end{equation} \]
The equation for forecasting the next period’s growth rate can be given by the following
\[ \begin{aligned} G_{t+1} = \delta + \theta_{1}G_{t} + \theta_{2}G_{t-1} + v_{t+1} \end{aligned} \]
We won’t actually calculate the future value of \(g\), but because you already know the values of \(G_{t}\) and \(G_{t-1}\) from your fitted model, you can forecast the next period’s growth rate. Additionally, your forecast isn’t just limited to one period. If you wanted to calculate the growth rate for two periods ahead, the equation becomes
\[ \begin{aligned} G_{t+2} = \delta + \theta_{1}G_{t+1} + \theta_{2}G_{t} + v_{t+2} \end{aligned} \]
Again, you already know all of the values needed to forecast the growth rate in the next two periods. You know the value of \(G_{t+1}\) because you presumably calculated earlier. You can potentially do this for however many periods into the future you so wish to choose.
There are multiple methods and techniques for time series regression. We covered three time series regression models. The autoregressive (AR) model, a regression model in which the dependent variable is regressed on its lagged value, the distributed lag model, a model in which a dependent variable is regressed on independent variables and their lagged values, and the ARDL model, which combines both the autoregressive and distributed-lag models to accommodate for a lagged dependent variable and independt variables and their lags. Time series regression can be a very powerful tool. It can be used to forecast future values as well as strategic analysis, where one can analyze the effects of an independent variable and its lagged values have on a dependent variable.