I. Introduction

In this assignment we aim to use a vector autoregression model to forecast the real GDP growth rate for the first quarter of 2017 and compare our forecast to three well known forecasts; namely, (1) The Fedearl Bank of New York Nowcast, (2) the GDPNow Federal Bank of Atlanta forecast, and (3) the Wall Street Journal Economic Forecasting Survey. We estimate that the real GDP growth rate will be 3.09% for the first quarter of 2017.

The remainder of the assignment is as follows: First we run a bivarted reduced form VAR model using the real GDP growth rate and the inflation adjusted S&P500 returns. In Section II we download both of these variables. In Section III part A we model a bivariate VAR model over these variables and show the impulse response functions (IRFs) and forecast error varaince decompositions (FEVDs) for each variable. In part B of Section III we run a Granger Causality test to determine if we need to restrict our models. In Section III part C we restrict our VAR model from part A based upon the Granger test from part B. In part D of Section III we reexamine the IRF and FEVD functions for the new restricted model from part C. In part E we reestimate our VAR model to include a third variable, the leading index of the United States. We also restrict this model based upon a Granger Causality test. Finally, in part F of Section III we use the three variable restricted VAR system to forecast the real GDP growth rate and compare our forecast to three other measures. Section IV concludes.

II. Downloading the Data

For this assignment we use four time series in total, first using only three of the time series and then introducing the last time series later in the assignment. First we obtain quarterly time series for U.S. real GDP, the GDP deflator, and the quarterly closing value of the S&P 500 Index. We then use these three variables to construct two time series, \(y_{1,t}\), which approximates the annualized growth rate of the U.S. real GDP and \(y_{2,t}\) which approximates the inflation adjusted annual return of the S&P 500. More formally we define \(y_{1,t}\) and \(y_{2,t}\) as:

\[ y_{1,t} = 400\Delta\log {GDP}_t \] \[ y_{2,t} = 400(\Delta\log {SP500}_t - \Delta\log {p}_t^{GDP}) \]

We say that \({GDP}_t\) is the quarterly time series for U.S. real GDP, \({SP500}_t\) is the quarterly closing value of the S&P 500 Index, and \({p}_t^{GDP}\) is the GDP deflator. We use the Quandl package to download the data into R directly. We then perform the above transformations to create \(y_{1,t}\) and \(y_{2,t}\). We also restrict the sample for the first quarter of 1961 to the last quarter of 2016. The code to perform this follows:

  # Import the Data
Quandl.api_key("75HFthvbcxsrteNWupus")

data.realGDP <- Quandl("FRED/GDPC96", type = 'zoo')
data.GDPdeflator <- Quandl("FRED/GDPDEF", type = 'zoo')
data.SP500 <- Quandl("YAHOO/INDEX_GSPC/CLOSE", type = 'zoo', collapse = "quarterly")


  # Align the data to have same start and end time 
sample.start <- 1960.75
sample.end <- 2016.75

x1 <- window(data.realGDP, start = sample.start, end = sample.end)
x2 <- window(data.GDPdeflator, start = sample.start, end = sample.end)
x3 <- window(data.SP500, start = sample.start, end = sample.end)


  # Construct the two time series varaibles as specified in the instructions
y1 <- 400 * diff(log(x1), differences = 1)
y2 <- 400 * diff(log(x3), differences = 1) - diff(log(x2), differences = 1)

For convience we plot a time series of our two data sets:

Later in the assignment we will include a third variable \(y_{3,t}\), the leading index for the United States. Like before we use the Quandl package to load the data directly into R, however the data starts in the first quarter of 1982, therefore we will later restrict both \(y_{1,t}\) and \(y_{2,t}\) to also be restricted to start in the first quarter of 1982. The code to do so will be given later in the assignment when we need the data.

III. The Vector Autoregression Model

Part A - Estimating a Reduced Form Bivariate VAR

We start by estimating a bivariate reduced form vector autoregression model (VAR) for \(\mathbf{y_t} = (y_{1,t}, y_{2,t})'\) for the sample period (1961Q1 - 2016Q4). We create the vector \(\mathbf{y_t}\) simply by combining \(y_{1,t}\) and \(y_{2,t}\) as columns over each quarter in time across the sample.

y <- cbind(y1, y2)

To choose how many lags need to be included in our VAR model we use the VARselect function. This function calculates four different information criteria across a number of different lags (up to a maximium specified within the function) and chooses the lag that has the lowest information criteria for each of the four statistics.

VARselect(y, lag.max = 12, type = "const")
$selection
AIC(n)  HQ(n)  SC(n) FPE(n) 
     2      2      2      2 

$criteria
                 1           2           3           4           5
AIC(n)    9.183713    9.098032    9.119926    9.140073    9.159218
HQ(n)     9.222109    9.162025    9.209517    9.255261    9.300003
SC(n)     9.278711    9.256361    9.341588    9.425067    9.507543
FPE(n) 9737.277678 8937.838988 9135.968916 9322.401241 9503.400073
                 6           7            8            9           10
AIC(n)    9.183716    9.193341     9.218945     9.227026     9.239990
HQ(n)     9.350098    9.385320     9.436522     9.470200     9.508761
SC(n)     9.595373    9.668329     9.757265     9.828678     9.904974
FPE(n) 9740.274202 9836.099441 10093.381659 10178.066025 10314.372775
                 11           12
AIC(n)     9.251481     9.244463
HQ(n)      9.545849     9.564428
SC(n)      9.979797    10.036110
FPE(n) 10437.867910 10369.951890

As we can see, each of the four information criteria statistics suggest that we use 2 lags in our VAR model, therefore in our specification we will use 2 lags so that our VAR model can be written mathematically as:

\[ y_{1,t} = c_{0,1} + b_{0,12}y_{2,t} + b_{1,11}y_{1,t-1} + b_{1,12}y_{2,t-1} + b_{2,11}y_{1,t-2} + b_{2,12}y_{2,t-2} + \varepsilon_{1,t}\] \[ y_{2,t} = c_{0,2} + b_{0,21}y_{1,t} + b_{1,21}y_{1,t-1} + b_{1,22}y_{2,t-1} + b_{2,21}y_{1,t-2} + b_{2,22}y_{2,t-2} + \varepsilon_{1,t}\] We code the above bivariate reduced form vAR model in R as:

var.model <- VAR(y, p = 2, type = "const")

Table 1: Reduced form Bivariate VAR Model

Dependent variable:
y1 y2
y1.l1 0.220*** 0.577
(0.065) (0.719)
y2.l1 0.016*** 0.087
(0.006) (0.068)
y1.l2 0.172*** -0.906
(0.063) (0.703)
y2.l2 0.024*** -0.090
(0.006) (0.069)
const 1.568*** 7.405**
(0.293) (3.256)
Observations 222 222
R2 0.235 0.023
Adjusted R2 0.221 0.005
Residual Std. Error (df = 217) 2.890 32.132
F Statistic (df = 4; 217) 16.665*** 1.298
Note: p<0.1; p<0.05; p<0.01

The first model has all statistically significant coefficients at the 1% significance level, whereas the second model only has one statistically significant coefficient. Therefore, we will want to eventually run a Granger causality test to test whether the lags of \(y_{1,t}\) should be included in the \(y_{2,t}\) model. It is also important to note that the residuals have a correlation of 16.4%, which is relatively low. Ideally, we want the residuals to be white noise and thus to have correlation that is statistically non-different from zero. Thus, to further examine the residuals we plot the residuals for each model below along with their ACFs and PACFs.

plot(var.model, names = "y1", lag.acf = 16, lag.pacf = 16)
plot(var.model, names = "y2", lag.acf = 16, lag.pacf = 16)

All of the ACF and PACF coefficients appear to be insignificant from zero. Thus, we say that the residuals are approximately white noise, and thus that our VAR model is approximately weakly stationary. To further examine the relationship between \(y_{1,t}\) and \(y_{2,t}\) we perform both an impluse response analysis and a forecast error variance decomposition.

Impulse Response Function (IRF)

The goal of the impulse response function is to track the response of one of our variables to a one time shock. In other words, we want to know how our dynamic system (the system of our variables as endogenoues functions of one another) responds to some external change which we call an impulse (sometimes called a brief input signal). The idea is we want to know how \(y_{1,t}\) and \(y_{2,t}\) respond to some exogenous shock. Specifically, in the context of this assignment, the impulse response analysis creates an impulse response function that measures the reaction of our endogenoues macroeconomic variables (real GDP and inflation adjusted S&P 500 returns) to the exogenous shock at the time of the shock and over subsequent points in time. Mathematically we say that we are obtaining the response of \(y_i\) across time to a one time increase in \(\varepsilon_{j,t}\) The impulse response function we calcualte is based on the Choleski Decomposition.

var.model.IRF <- irf(var.model, n.ahead = 20)
par(mfcol=c(2,2), cex = 0.7)
plot(var.model.IRF, plot.type = "single")

When there is an exogenous shock to real GDP the initial impact (when the shock hits) appears to affect inflation adjusted S&P 500 returns more than real GDP itself. The time to recover from the shock appears to be about the same for both real GDP and inflation adjsuted S&P 500 returns (approximately 6 quarters, 1 and a hlf years economically, and approximately 12 quarters, 3 years statistically). However, real GDP appears to recover with less variation across time as comparted to the S&P500 returns. The exogenous shock to real GDP appears to positively affect S&P 500 returns, but appears to create a type of overreaction since the returns drop quickly so much so that the deviation from the shock becomes negative.

When there is an exogenous shock to the inflation adjusted S&P500 returns there is little to no impact on the value of real GDP, whereas, the S&P 500 returns are affected drastically, by close to 30%. Not only is the affect of the shock small for real GDP but the recovery is also relatively quick between 6 and 8 quarters (1 and a half to about 2 years). However, we also note that although the shock drastically affects S&P 500 returns, the large affect only appears to occur up until the first quarter. After the first quarter the shock is still evident, but its affects are weak. After about the 3 quarter the affects of the shock, although statistically significant, appear to be economically insignificant.

We also note that because our time series is approximately weakly stationary, then the deviations from the shock eventually converge to zero as the system converges back to the long run equilibirum given by the mean of the system (the mean being the constant mean as defined in the weakly stationary time series). We also point out that a 1% change to GDP is a much larger difference than a 1% change in S&P500 returns due to the nature of these two variables. A 1% change to the S&P500 index can occur overnight, but a 1% change to GDP occurs slowly across a quarter or even over many quarters. However, since we are using quarterly data, this concern is mitigated at least in part.

Forecast Error Variance Decomposition (FEVD)

The goal of the forecast error variance decomposition is to find the fraction of the overall fluctuations in our VAR variables that is due to some shock. In other words the variance decomposition indicates the amount of information each variable contributes to the other variables in the VAR model. It determines how much of the forecast error variance of each of the variables can be explained by exogenous shocks to the other variables. We both plot and present the decomposition of the percent change in each variable due to the shocks of each variable in the system.

var.model.FEVD <- fevd(var.model, n.ahead = 12)
plot(var.model.FEVD, addbars = 2, col = c("red3", "royalblue3"))

    y1.e -> y1 y2.e -> y1 y1.e -> y2 y2.e -> y2
Q1      1.0000     0.0000     0.0269     0.9731
Q2      0.9711     0.0289     0.0309     0.9691
Q3      0.8960     0.1040     0.0364     0.9636
Q4      0.8885     0.1115     0.0368     0.9632
Q5      0.8865     0.1135     0.0369     0.9631
Q6      0.8861     0.1139     0.0369     0.9631
Q7      0.8860     0.1140     0.0369     0.9631
Q8      0.8860     0.1140     0.0369     0.9631
Q9      0.8860     0.1140     0.0369     0.9631
Q10     0.8860     0.1140     0.0369     0.9631
Q11     0.8860     0.1140     0.0369     0.9631
Q12     0.8860     0.1140     0.0369     0.9631

The first plot gives us the decomposition of real GDP of the fluctuation in real GDP caused by shocks to all variables in the VAR system. In other words, the first plot tells us the percent variation in real GDP caused by shocks to real GDP and shocks to the inflation adjusted S&P500 returns. As we can see, the variation in real GDP at the first quarter is 100% composed of the shock from real GDP. Over time the shock in the S&P500 returns accounts for more variation in real GDP, but real GDP fluctuations are almost entirely composed of shocks to real GDP. Also, after the sixth quarter the percent of variation that each shock explains remains constant. In short, the first plot (and the first and second columns of the table) show us that over time the exogenous shocks to the inflation adjusted S&P500 returns explain more variation in real GDP, but that most of the variation in real GDP is explained by the exogenous shock to real GDP itself.

The second plot gives us the decomposition of inflation adjusted S&P500 returns caused by shocks to all variables in the VAR system. In other words, the second plot tells us the percent variation in inflation adjusted S&P500 returns caused by shocks to real GDP and shocks to the inflation adjusted S&P500 returns. As we can see, the variation in inflation adjusted S&P500 returns at the first quarter is 97.31% composed of the shock from inflation adjusted S&P500 returns. Over time the shock in real GDP accounts for more variation in inflation adjusted S&P500 returns, however this proportion remains extremely small. After the fifth quarter the percent of variation that each shock explains remains constant.

It is also important to note that the shocks of the inflation adjusted S&P500 returns explains more percent variation in the fluctuation of real GDP than do the real GDP shocks in explaining the fluctuation in the inflation adjusted S&P500 returns.

Part B - Granger Causality Tests

In this section we perform a Granger causality test on both of our variables. This test examines whether lags of one variable in our VAR system need to be included in the equation of another variable within our VAR system. If past values of a variable \(X\) contain information that help predict today’s value of \(Z\) above and beyond the information containted in past values of \(Z\) alone, then we say that \(X\) Granger causes variable \(Z\). The null hypothesis of this test is that variable \(X\) does not Granger cause variable \(Z\), that is, past values of \(X\) do not contain more information than what is already included in past values of \(Z\) in explaining variation in \(Z\) today. The alternate hypothesis of this test states that variable \(X\) does Granger cause variable \(Z\). We now run the Granger Causality test over our VAR system, with the code below.

causality(var.model, cause = "y1")$Granger

    Granger causality H0: y1 do not Granger-cause y2

data:  VAR object var.model
F-Test = 0.90777, df1 = 2, df2 = 434, p-value = 0.4042
causality(var.model, cause = "y2")$Granger

    Granger causality H0: y2 do not Granger-cause y1

data:  VAR object var.model
F-Test = 11.738, df1 = 2, df2 = 434, p-value = 0.00001084

The first Granger test hypothesizes that real GDP does not Granger cause inflation adjusted S&P500 returns. The first test has a p-value of 0.4042, hence, we can not reject the null hypothesis. We interpret the results of this first test as saying that past values of real GDP do not include any more information than past values of inflation adjusted S&P500 returns in estimating today’s inflation adjusted S&P500 returns. This is most likely why we find no statistically signficiant coefficients in our VAR system for the \(y_{2,t}\) model earlier. Therefore, based on this test we should not include any of the real GDP lags in our equation for inflation adjusted S&P500 returns.

The second Granger test hypothesizes that inflation adjusted S&P500 returns do not Granger cause real GDP. The second test has a p-value of 0.000011, hence we can reject the null hypothesis easily at the 1% level. We interpret the results of this second test as saying that past values of inflation adjusted S&P500 returns do include more information than past real GDP values do alone in estimating today’s real GDP. This is most likely why we find all of the coefficeints in our first VAR equation to be statistically significant at the 1% level.

Part C - Restricted VAR Model

Based upon the Granger causality tests in part B, we update our VAR model so as to remove all the real GDP lags from the inflation adjusted S&P500 returns equation because real GDP does not Granger cause inflation adjusted S&P500 returns. Hence, our restricted VAR model can be written mathematically as:

\[ y_{1,t} = c_{0,1} + b_{0,12}y_{2,t} + b_{1,11}y_{1,t-1} + b_{1,12}y_{2,t-1} + b_{2,11}y_{1,t-2} + b_{2,12}y_{2,t-2} + \varepsilon_{1,t}\] \[ y_{2,t} = c_{0,2} + b_{0,21}y_{1,t} + b_{1,22}y_{2,t-1} + b_{2,22}y_{2,t-2} + \varepsilon_{1,t}\]

Our new VAR model is written in R by using a restriction matrix which we define below:

restriction.matrix <- matrix(1, nrow = 2, ncol = 5)
restriction.matrix[2, c(1,3)] <- 0

var.restricted <- restrict(var.model, method = "manual", resmat = restriction.matrix)

Table 2: Restricted Reduced form Bivariate VAR Model

Dependent variable:
Original y1 Original y2 Restricted y1 Restricted y2
y1.l1 0.220*** 0.577 0.220***
(0.065) (0.719) (0.065)
y2.l1 0.016*** 0.087 0.016*** 0.093
(0.006) (0.068) (0.006) (0.067)
y1.l2 0.172*** -0.906 0.172***
(0.063) (0.703) (0.063)
y2.l2 0.024*** -0.090 0.024*** -0.089
(0.006) (0.069) (0.006) (0.067)
const 1.568*** 7.405** 1.568*** 6.368***
(0.293) (3.256) (0.293) (2.232)
Observations 222 222 222 222
R2 0.235 0.023 0.586 0.053
Adjusted R2 0.221 0.005 0.577 0.040
Residual Std. Error 2.890 (df = 217) 32.132 (df = 217) 2.890 (df = 217) 32.118 (df = 219)
F Statistic 16.665*** (df = 4; 217) 1.298 (df = 4; 217) 61.548*** (df = 5; 217) 4.045*** (df = 3; 219)
Note: p<0.1; p<0.05; p<0.01

In Table 2 we present the original VAR model and the restricted VAR model coefficients. The restricted y1 model coefficients are unaffected and thus retain their statistical signficance at the 1% level. However, the restricted y2 equation coefficeints remain statistiaclly insignficant. However, this is not surprising since y2 is S&P500 returns. Effectively the restricted y2 equation suggests that we can model today’s S&P500 returns based upon the past two quarters S&P500 returns, which under the assumption that markets are efficeint, cannot happen. Hence, we should expect these coefficeints to remain statistically insignificant.

It is important to point out, however, that the adjusted \(R^2\) for both models increasing drastically, in both cases more than doubling. Hence, we believe that the restricted models are much better than the original models even though the statistical significance of the coefficeints did not change.

Part D - IRF and FEVD of the Restricted VAR Model

In this section we re-compute the IRF and FEVD that we computed in part B of the new, restricted VAR model. We also point out that the actual ordering of our variables in the VAR system matter for our IRF’s and FEVD’s. Therefore, not only will we recompute the IRF and FEVD, but we will compute two of each, one for each ordering of our variables in the system. We thus first create two different VAR models containing different orders of y1 and y2.

y.order1 <- cbind(y1, y2)
y.order2 <- cbind(y2, y1)

var.model.order1 <- VAR(y.order1, p = 2, type = "const")
var.model.order2 <- VAR(y.order2, p = 2, type = "const")

var.model.order1 <- restrict(var.model.order1, method = "manual", resmat = restriction.matrix)
var.model.order2 <- restrict(var.model.order2, method = "manual", resmat = restriction.matrix)

Below we plot the impulse response functions of each of the orderings of the restricted VAR models.

var.model.order1.IRF <- irf(var.model.order1, n.ahead = 20)
var.model.order2.IRF <- irf(var.model.order2, n.ahead = 20)
VAR System Order: y1 and y2

VAR System Order: y2 and y1

For both orderings, the impulse response from y2 (the shock comes to y2) seems to affect both y1 and y2 no matter how they are ordered in the VAR system. Similarly, if the exogenous shock comes from y1 then no matter how we order the variables in the system the shock affects y1 similarly. However, the ordering does matter when we consider how the exogenous shock from y1 affects y2, however the confidence intervals are similar, which implies that statistically these impulse response functions are not statistically different from one another.

Below we plot and report the forecast error variance decomposition of each of the orderings of the restricted VAR models.

var.model.order1.FEVD <- fevd(var.model.order1, n.ahead = 12)
var.model.order2.FEVD <- fevd(var.model.order2, n.ahead = 12)

VAR System Order: y1 and y2

    y1.e -> y1 y2.e -> y1 y1.e -> y2 y2.e -> y2
Q1      1.0000     0.0000     0.0269     0.9823
Q2      0.9712     0.0291     0.0269     0.9823
Q3      0.8953     0.1058     0.0269     0.9823
Q4      0.8877     0.1135     0.0269     0.9823
Q5      0.8860     0.1152     0.0269     0.9823
Q6      0.8856     0.1156     0.0269     0.9823
Q7      0.8854     0.1158     0.0269     0.9823
Q8      0.8854     0.1158     0.0269     0.9823
Q9      0.8854     0.1159     0.0269     0.9823
Q10     0.8854     0.1159     0.0269     0.9823
Q11     0.8854     0.1159     0.0269     0.9823
Q12     0.8854     0.1159     0.0269     0.9823

VAR System Order: y2 and y1

    y2.e -> y2 y1.e -> y2 y2.e -> y1 y1.e -> y1
Q1      1.0000     0.0000     0.0245     0.9847
Q2      0.9972     0.0029     0.0245     0.9847
Q3      0.9932     0.0069     0.0245     0.9847
Q4      0.9928     0.0073     0.0245     0.9847
Q5      0.9926     0.0074     0.0245     0.9847
Q6      0.9926     0.0075     0.0245     0.9847
Q7      0.9926     0.0075     0.0245     0.9847
Q8      0.9926     0.0075     0.0245     0.9847
Q9      0.9926     0.0075     0.0245     0.9847
Q10     0.9926     0.0075     0.0245     0.9847
Q11     0.9926     0.0075     0.0245     0.9847
Q12     0.9926     0.0075     0.0245     0.9847

Both FEVD functions show similar results to what we have already shown earlier. It is also important to point out that the proportions in the third and fourth column do not change because there is no y1 lags in y2 anymore because of the restriction imposed by the Granger causality test.

In short, the order of our variables in the VAR system do not matter much when looking at how exogenous shocks affects our system.

Part E - A Third Variable to the VAR System

As mentioned earlier in section II, we will now add a third variable, leading index for the United States, to our VAR system, which we will denote as \(y_{3,t}\). Furthermore, this variable only has data available from the first quarter of 1982, hence, we will restrict our other two time series variables in our model to start at this date as well.

data.LeadIndex <- Quandl("FRED/USSLIND", type = 'zoo', collapse = "quarterly")

sample.start <- 1981.75
sample.end <- 2016.75

x1 <- window(data.realGDP, start = sample.start, end = sample.end)
x2 <- window(data.GDPdeflator, start = sample.start, end = sample.end)
x3 <- window(data.SP500, start = sample.start, end = sample.end)
x4 <- window(data.LeadIndex, start = sample.start, end = sample.end)

y1 <- 400 * diff(log(x1), differences = 1)
y2 <- 400 * diff(log(x3), differences = 1) - diff(log(x2), differences = 1)
y3 <- x4

y <- cbind(y1, y2, y3)

As before we use the VARselect function to determine how many lags need to go in our model.

VARselect(y, lag.max = 12, type = "const")
$selection
AIC(n)  HQ(n)  SC(n) FPE(n) 
     1      1      1      1 

$criteria
                1          2          3          4          5          6
AIC(n)   6.254559   6.282148   6.337139   6.440722   6.528350   6.575927
HQ(n)    6.363196   6.472263   6.608732   6.793792   6.962898   7.091952
SC(n)    6.521936   6.750059   7.005584   7.309699   7.597861   7.845972
FPE(n) 520.411420 535.111902 565.718538 628.181915 686.971751 722.383314
                7          8          9         10         11         12
AIC(n)   6.515174   6.505020   6.497792   6.488922   6.525162   6.644343
HQ(n)    7.112677   7.184001   7.258251   7.330858   7.448576   7.649235
SC(n)    7.985752   8.176132   8.369437   8.561100   8.797873   9.117588
FPE(n) 682.329106 678.762949 678.137225 677.477398 709.326770 808.536670

It seems as if every information criteria statistic chooses to use only one lag in our VAR model, hence, this is how many lags we will include.

var.model <- VAR(y, p = 1, type = "const")

We also perform a Granger causality test to determine if any restrictions need to be made on our model. As a reminder, the Granger causality tests will tell us if we need to exclude any of the other variable lags in each varaible’s equation in our system.

causality(var.model, cause = "y1")$Granger

    Granger causality H0: y1 do not Granger-cause y2 y3

data:  VAR object var.model
F-Test = 1.3316, df1 = 2, df2 = 405, p-value = 0.2652
causality(var.model, cause = "y2")$Granger

    Granger causality H0: y2 do not Granger-cause y1 y3

data:  VAR object var.model
F-Test = 4.5907, df1 = 2, df2 = 405, p-value = 0.01068
causality(var.model, cause = "y3")$Granger

    Granger causality H0: y3 do not Granger-cause y1 y2

data:  VAR object var.model
F-Test = 11.025, df1 = 2, df2 = 405, p-value = 0.00002176

The first Granger test shows us that we cannot reject the null hypothesis that y1 does not Granger cause y2 or y3. Hence, we will need to remove the y1 lag from both the y2 and y3 equations. The second test shows us that we can reject the null, so that we can say with approximately 99% confidence that y2 Granger causes both y1 and y3, hence we will include the lag of y2 in both of these equations. Similarly, the third test suggests that y3 Granger causes y1 and y2, hence we will keep the y3 lag in both the y1 and y2 equations.

The results of the Granger test thus encourage us to restrict our VAR model as we did before. Hence our restricted VAR model is given by the following code.

restriction.matrix <- matrix(1, nrow = 3, ncol = 4)
restriction.matrix[c(2,3), 1] <- 0

var.restricted <- restrict(var.model, method = "manual", resmat = restriction.matrix)

We summarize the results of our models with the below code. It is important to point out that the restricted models are better than the original models as shown by the significant increase in the \(R^2\) of each of the equations in the system. Furthermore, it is worthwhile to point out that the VAR system which now includes three variables appears to be better than the VAR system that included only two varaibles. Again, the reasining is beacuse the \(R^2\) signficantly improves and the statistical signficance of the coefficeints remains relatively unchanged. Hence, we can explain more variation with our three variable VAR system than our two varaible VAR system without losing any statistical power.

Table 3: Restricted Reduced form Bivariate VAR Model

Dependent variable:
Original y1 Original y2 Original y3 Restricted y1 Restricted y2 Restricted y3
y1.l1 0.074 0.260 0.032 0.074
(0.098) (1.522) (0.020) (0.098)
y2.l1 0.011* 0.054 0.003*** 0.011* 0.055 0.004***
(0.006) (0.088) (0.001) (0.006) (0.087) (0.001)
y3.l1 1.428*** 2.287 0.742*** 1.428*** 2.870 0.814***
(0.305) (4.740) (0.062) (0.305) (3.280) (0.044)
const 0.585* 4.469 0.236*** 0.585* 4.391 0.226***
(0.323) (5.026) (0.066) (0.323) (4.987) (0.066)
Observations 139 139 139 139 139 139
R2 0.348 0.011 0.755 0.704 0.079 0.929
Adjusted R2 0.334 -0.011 0.749 0.695 0.059 0.927
Residual Std. Error 2.054 (df = 135) 31.926 (df = 135) 0.420 (df = 135) 2.054 (df = 135) 31.812 (df = 136) 0.423 (df = 136)
F Statistic 24.031*** (df = 3; 135) 0.511 (df = 3; 135) 138.480*** (df = 3; 135) 80.102*** (df = 4; 135) 3.902** (df = 3; 136) 590.368*** (df = 3; 136)
Note: p<0.1; p<0.05; p<0.01

Part F - Forecasting

In this section we use our final three variable restricted VAR system that we obtained in part E to forecast the real GDP growth rate for the first quarter of 2017. We will also compare our forecast to three well known entities that have also forecasted the real GDP growth rate for the first quarter of 2017, namely, (1) The Fedearl Bank of New York Nowcast, (2) the GDPNow Federal Bank of Atlanta forecast, and (3) the Wall Street Journal Economic Forecasting Survey’s minimum, average, and maximum forecast values.

We start by computing the forecast using our restricted three variable VAR model and then plot the results in a graph. The solid black line shows the actual forecast values for the next four quarters, and the lighter grey lines protruding from the center of this line represnt the confidence intervals.

var.forecast <- predict(var.restricted, n.ahead = 4)
fanchart(var.forecast)

Specifically, we will now compare the forecast of y1 (real GDP) for the first quarter of 2017 to five other measures.

nowcast.forecast <- 2.6
GDPnow.forecast <- 0.5
WSJ.forecast.max <- 3.4
WSJ.forecast.avg <- 1.4
WSJ.forecast.min <- 0.6

my.forecast <- round(var.forecast$fcst[["y1"]][1,1], 3)

rbind(nowcast.forecast, GDPnow.forecast, WSJ.forecast.max, WSJ.forecast.avg, WSJ.forecast.min, my.forecast)
                 fcst
nowcast.forecast 2.60
GDPnow.forecast  0.50
WSJ.forecast.max 3.40
WSJ.forecast.avg 1.40
WSJ.forecast.min 0.60
my.forecast      3.09

Our forecast for the real GDP in the first quarter of 2017 is 3.09%, which is much higher than any of the other forecast values, except for the maximum of the Wall Street Journals forecast.

IV. Conclusion

In short, we have successfully forecasted real GDP growth rate using a three varaible vector autoregression model. We forecast that real GDP growth rate for the first quarter of 2017 to be 3.09%.

Note: the tables were produced by the stargazer package created by Hlavac, Marek (2015).