I first obtain quarterly time series data for GDP using FRED/GDPC1, GDP deflator FRED/GDPDEF and the quarterly closing value of the S&P 500 Index YAHOO/INDEX_GSPC.
First we load the “Quandl” packages and etc.
library("Quandl")
library("vars")
library("gdata")
library("stargazer")
Then we find the \(y_{1,t}=\Delta log GDP_t\) and the \(y_{2,t}=\Delta log SP500_t-\Delta log p_t^{GDP}\)
First we find first differences of the log of the real GDP. For all of our datasets we will restrict it from 1950 Q1 to 2015 Q4.
gdp<-Quandl("FRED/GDPC1", type="zoo")
gdp<-window(gdp,start="1950 Q1", end="2015 Q4")
lgdp<-log(gdp)
ldgdp<-diff(lgdp,1)*100
par(mfrow=c(1,2))
plot(gdp,xlab="Time", ylab="GDP")
plot(ldgdp,xlab="Time", ylab="First Diff Log GDP")
Then we find the first differences of the log of the GDP Deflator
gdpd<-Quandl("FRED/GDPDEF", type="zoo")
gdpd<-window(gdpd,start="1950 Q1", end="2015 Q4")
lgdpd<-log(gdpd)
ldgdpd<-diff(lgdpd,1)*100
par(mfrow=c(1,2))
plot(gdpd,xlab="Time", ylab="GDP Deflator")
plot(ldgdpd,xlab="Time", ylab="First Diff Log GDPDEF")
In order to find the real log return of the S&P 500 \(y_{2,t}=\Delta log SP500_t-\Delta log p_t^{GDP}\) , we must subtract the log differences of the S&P500 with the log differences of the GDP Deflator.
snp<-Quandl("YAHOO/INDEX_GSPC", collapse="quarterly", type="zoo")
snp<-window(snp,start="1950 Q1", end="2015 Q4")
lsnp<-log(snp$Close)
ldsnp<-diff(lsnp,1)*100
par(mfrow=c(1,2))
plot(lsnp,xlab="Time", ylab="Log SNP 500")
plot(ldsnp,xlab="Time", ylab="First Diff Log SNP")
Now that we have all the components, lets merge them together and take out missing values.
y<-cbind(ldsnp,ldgdpd,ldgdp)
y<-na.trim(y)
So now that we have \(\Delta log SP500_t\) and \(\Delta log p_t^{GDP}\) we should subtract them to get \(y_{2,t}\). We will also demean the data, even though I am not sure why we do.
y2<-y$ldsnp-y$ldgdpd
y1<-y$ldgdp
yy<-cbind(y1,y2)
yy<-sweep(yy,2,apply(yy,2,mean))
plot(yy)
Now we look for \(y_t=(y_{1,t},y_{2,t})'\) for the period 1951Q1-2014Q4. Then we estimate a reduced form VAR.
yy<-window(yy,start="1951 Q1", end="2014 Q4")
VARselect(yy, lag.max=10)
## $selection
## AIC(n) HQ(n) SC(n) FPE(n)
## 2 2 1 2
##
## $criteria
## 1 2 3 4 5 6
## AIC(n) 3.781582 3.725213 3.736946 3.763633 3.781136 3.802590
## HQ(n) 3.816008 3.782589 3.817271 3.866909 3.907362 3.951766
## SC(n) 3.867078 3.867706 3.936436 4.020121 4.094621 4.173073
## FPE(n) 43.885536 41.480544 41.970897 43.107576 43.871083 44.825965
## 7 8 9 10
## AIC(n) 3.811918 3.840491 3.860155 3.877568
## HQ(n) 3.984044 4.035568 4.078181 4.118544
## SC(n) 4.239397 4.324968 4.401629 4.476039
## FPE(n) 45.250817 46.568888 47.501981 48.346834
The model with the lowest AIC has the best fit. It looks like that would be (2)
Lets check what the correlation matrix looks like
var2<-VAR(yy,p=2,type="const")
summary(var2)
##
## VAR Estimation Results:
## =========================
## Endogenous variables: y1, y2
## Deterministic variables: const
## Sample size: 254
## Log Likelihood: -1184.933
## Roots of the characteristic polynomial:
## 0.4454 0.4454 0.304 0.304
## Call:
## VAR(y = yy, p = 2, type = "const")
##
##
## Estimation results for equation y1:
## ===================================
## y1 = y1.l1 + y2.l1 + y1.l2 + y2.l2 + const
##
## Estimate Std. Error t value Pr(>|t|)
## y1.l1 0.245040 0.061477 3.986 8.84e-05 ***
## y2.l1 0.023097 0.006405 3.606 0.000375 ***
## y1.l2 0.079349 0.059543 1.333 0.183872
## y2.l2 0.026300 0.006572 4.002 8.29e-05 ***
## const -0.018032 0.050452 -0.357 0.721085
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.8038 on 249 degrees of freedom
## Multiple R-Squared: 0.2309, Adjusted R-squared: 0.2185
## F-statistic: 18.68 on 4 and 249 DF, p-value: 1.91e-13
##
##
## Estimation results for equation y2:
## ===================================
## y2 = y1.l1 + y2.l1 + y1.l2 + y2.l2 + const
##
## Estimate Std. Error t value Pr(>|t|)
## y1.l1 0.256054 0.609111 0.420 0.6746
## y2.l1 0.106501 0.063457 1.678 0.0945 .
## y1.l2 -0.819353 0.589953 -1.389 0.1661
## y2.l2 -0.040573 0.065115 -0.623 0.5338
## const -0.007916 0.499874 -0.016 0.9874
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 7.964 on 249 degrees of freedom
## Multiple R-Squared: 0.02087, Adjusted R-squared: 0.00514
## F-statistic: 1.327 on 4 and 249 DF, p-value: 0.2605
##
##
##
## Covariance matrix of residuals:
## y1 y2
## y1 0.646 0.87
## y2 0.870 63.42
##
## Correlation matrix of residuals:
## y1 y2
## y1 1.0000 0.1359
## y2 0.1359 1.0000
causality(var2,cause="y1")
## $Granger
##
## Granger causality H0: y1 do not Granger-cause y2
##
## data: VAR object var2
## F-Test = 0.96654, df1 = 2, df2 = 498, p-value = 0.3811
##
##
## $Instant
##
## H0: No instantaneous causality between: y1 and y2
##
## data: VAR object var2
## Chi-squared = 4.6072, df = 1, p-value = 0.03184
causality(var2,cause="y2")
## $Granger
##
## Granger causality H0: y2 do not Granger-cause y1
##
## data: VAR object var2
## F-Test = 15.762, df1 = 2, df2 = 498, p-value = 2.305e-07
##
##
## $Instant
##
## H0: No instantaneous causality between: y2 and y1
##
## data: VAR object var2
## Chi-squared = 4.6072, df = 1, p-value = 0.03184
So we see that there is no Granger Causality from \(Y_1\) to \(Y-2\), but we do see Granger Causality from \(Y_2\) to \(Y_1\). This means that we can use the S&P to predict real GDP, but not the other way around.
In the first model, we remove lags based on Granger Causality tests. We set our B0 matrix
B0 <- diag(2)
diag(B0) <- NA
B0[1, 2] <- NA
B0
## [,1] [,2]
## [1,] NA NA
## [2,] 0 NA
svar2 <- SVAR(var2, estmethod="direct", Amat=B0, hessian=TRUE, method="BFGS")
## Warning in SVAR(var2, estmethod = "direct", Amat = B0, hessian = TRUE,
## method = "BFGS"): The A-model is just identified. No test possible.
summary(svar2)
##
## SVAR Estimation Results:
## ========================
##
## Call:
## SVAR(x = var2, estmethod = "direct", Amat = B0, hessian = TRUE,
## method = "BFGS")
##
## Type: A-model
## Sample size: 254
## Log Likelihood: -1189.981
## Method: direct
## Number of iterations: 54
## Convergence code: 0
##
## Estimated A matrix:
## y1 y2
## y1 1.256 0.01723
## y2 0.000 0.12557
##
## Estimated standard errors for A matrix:
## y1 y2
## y1 0.05572 0.007916
## y2 0.00000 0.005571
##
## Estimated B matrix:
## y1 y2
## y1 1 0
## y2 0 1
##
## Covariance matrix of reduced form residuals (*100):
## y1 y2
## y1 64.6 -87
## y2 -87.0 6342
This is the restricted VAR model based on removing the lag from \(Y_1\) to \(Y_2\) in the Granger Causality test. Lets try restricting our variables by restricting the B1 matrix. Remember, even though our dimensions is 2 rows and 4 columns, because of the intercept, we have to make it 5 columns.
mat.r<- matrix(1,nrow=2, ncol=5)
mat.r[2,c(1,3)]<-0
mat.r
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 1 1 1 1
## [2,] 0 1 0 1 1
var2.r <- restrict(var2, method="manual", resmat=mat.r)
summary(var2.r)
##
## VAR Estimation Results:
## =========================
## Endogenous variables: y1, y2
## Deterministic variables: const
## Sample size: 254
## Log Likelihood: -1185.934
## Roots of the characteristic polynomial:
## 0.4297 0.2121 0.2121 0.1847
## Call:
## VAR(y = yy, p = 2, type = "const")
##
##
## Estimation results for equation y1:
## ===================================
## y1 = y1.l1 + y2.l1 + y1.l2 + y2.l2 + const
##
## Estimate Std. Error t value Pr(>|t|)
## y1.l1 0.245040 0.061477 3.986 8.84e-05 ***
## y2.l1 0.023097 0.006405 3.606 0.000375 ***
## y1.l2 0.079349 0.059543 1.333 0.183872
## y2.l2 0.026300 0.006572 4.002 8.29e-05 ***
## const -0.018032 0.050452 -0.357 0.721085
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.8038 on 249 degrees of freedom
## Multiple R-Squared: 0.2315, Adjusted R-squared: 0.2161
## F-statistic: 15 on 5 and 249 DF, p-value: 7.118e-13
##
##
## Estimation results for equation y2:
## ===================================
## y2 = y2.l1 + y2.l2 + const
##
## Estimate Std. Error t value Pr(>|t|)
## y2.l1 0.11089 0.06304 1.759 0.0798 .
## y2.l2 -0.04500 0.06304 -0.714 0.4760
## const 0.00313 0.49962 0.006 0.9950
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 7.963 on 251 degrees of freedom
## Multiple R-Squared: 0.01327, Adjusted R-squared: 0.001474
## F-statistic: 1.125 on 3 and 251 DF, p-value: 0.3395
##
##
##
## Covariance matrix of residuals:
## y1 y2
## y1 0.646 0.87
## y2 0.870 63.91
##
## Correlation matrix of residuals:
## y1 y2
## y1 1.0000 0.1354
## y2 0.1354 1.0000
We get different results when we restrict the B0 matrix vs. the B1 matrix. The latter is correct, because the test assumes that we already have our B0 matrix on the left hand side of the equation.
Now we make the restricted getting rid of all variables with t-stat below 2.
mat.r2<- matrix(1,nrow=2, ncol=5)
mat.r2[1,c(3)]<-0
mat.r2[2,c(1,2,3,4)]<-0
mat.r2
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 1 0 1 1
## [2,] 0 0 0 0 1
var2.r2 <- restrict(var2, method="manual", resmat=mat.r2)
summary(var2.r2)
##
## VAR Estimation Results:
## =========================
## Endogenous variables: y1, y2
## Deterministic variables: const
## Sample size: 254
## Log Likelihood: -1188.827
## Roots of the characteristic polynomial:
## 0.2734 0 0 0
## Call:
## VAR(y = yy, p = 2, type = "const")
##
##
## Estimation results for equation y1:
## ===================================
## y1 = y1.l1 + y2.l1 + y2.l2 + const
##
## Estimate Std. Error t value Pr(>|t|)
## y1.l1 0.273418 0.057761 4.734 3.70e-06 ***
## y2.l1 0.022632 0.006405 3.533 0.000488 ***
## y2.l2 0.026628 0.006578 4.048 6.88e-05 ***
## const -0.019029 0.050525 -0.377 0.706772
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.805 on 250 degrees of freedom
## Multiple R-Squared: 0.226, Adjusted R-squared: 0.2136
## F-statistic: 18.25 on 4 and 250 DF, p-value: 3.611e-13
##
##
## Estimation results for equation y2:
## ===================================
## y2 = const
##
## Estimate Std. Error t value Pr(>|t|)
## const 0.001059 0.500968 0.002 0.998
##
##
## Residual standard error: 7.984 on 253 degrees of freedom
## Multiple R-Squared: 1.767e-08, Adjusted R-squared: -0.003953
## F-statistic: 4.469e-06 on 1 and 253 DF, p-value: 0.9983
##
##
##
## Covariance matrix of residuals:
## y1 y2
## y1 0.6506 0.8224
## y2 0.8224 64.7701
##
## Correlation matrix of residuals:
## y1 y2
## y1 1.0000 0.1267
## y2 0.1267 1.0000
lm2 <- var2$varresult
lm.r <- var2.r$varresult
lm.r2 <- var2.r2$varresult
stargazer(lm2$y1, lm2$y2, lm.r$y1, lm.r$y2, lm.r2$y1, lm.r2$y2,
type="html", column.labels=rep(colnames(yy), 3),
dep.var.labels.include=FALSE)
| Dependent variable: | ||||||
| y1 | y2 | y1 | y2 | y1 | y2 | |
| (1) | (2) | (3) | (4) | (5) | (6) | |
| y1.l1 | 0.245*** | 0.256 | 0.245*** | 0.273*** | ||
| (0.061) | (0.609) | (0.061) | (0.058) | |||
| y2.l1 | 0.023*** | 0.107* | 0.023*** | 0.111* | 0.023*** | |
| (0.006) | (0.063) | (0.006) | (0.063) | (0.006) | ||
| y1.l2 | 0.079 | -0.819 | 0.079 | |||
| (0.060) | (0.590) | (0.060) | ||||
| y2.l2 | 0.026*** | -0.041 | 0.026*** | -0.045 | 0.027*** | |
| (0.007) | (0.065) | (0.007) | (0.063) | (0.007) | ||
| const | -0.018 | -0.008 | -0.018 | 0.003 | -0.019 | 0.001 |
| (0.050) | (0.500) | (0.050) | (0.500) | (0.051) | (0.501) | |
| Observations | 254 | 254 | 254 | 254 | 254 | 254 |
| R2 | 0.231 | 0.021 | 0.231 | 0.013 | 0.226 | 0.00000 |
| Adjusted R2 | 0.219 | 0.005 | 0.216 | 0.001 | 0.214 | -0.004 |
| Residual Std. Error | 0.804 (df = 249) | 7.964 (df = 249) | 0.804 (df = 249) | 7.963 (df = 251) | 0.805 (df = 250) | 7.984 (df = 253) |
| F Statistic | 18.685*** (df = 4; 249) | 1.327 (df = 4; 249) | 15.001*** (df = 5; 249) | 1.125 (df = 3; 251) | 18.251*** (df = 4; 250) | 0.00000 (df = 1; 253) |
| Note: | p<0.1; p<0.05; p<0.01 | |||||
So I plot the IRFs for the VAR model based on Choleski decomposition
var2.irfs <- irf(var2, n.ahead=40)
par(mfcol=c(2,2), cex=0.6)
plot(var2.irfs, plot.type="single")
Here is the FEVD
var2.fevd <- fevd(var2, n.ahead=40)
var2.fevd[[1]][c(1,4,8,40),]
## y1 y2
## [1,] 1.0000000 0.0000000
## [2,] 0.8607464 0.1392536
## [3,] 0.8591922 0.1408078
## [4,] 0.8591882 0.1408118
var2.fevd[[2]][c(1,4,8,40),]
## y1 y2
## [1,] 0.01847382 0.9815262
## [2,] 0.02623100 0.9737690
## [3,] 0.02642786 0.9735721
## [4,] 0.02642813 0.9735719
plot(var2.fevd)
plot(var2.fevd, addbars=8)
Now lets reverse the order of y1 and y2 and run plot the IRF and the FEVD.
yy2<-cbind(y2,y1)
yy2<-window(yy2,start="1951 Q1", end="2014 Q4")
VARselect(yy2, lag.max=10)
## $selection
## AIC(n) HQ(n) SC(n) FPE(n)
## 2 2 1 2
##
## $criteria
## 1 2 3 4 5 6
## AIC(n) 3.781582 3.725213 3.736946 3.763633 3.781136 3.802590
## HQ(n) 3.816008 3.782589 3.817271 3.866909 3.907362 3.951766
## SC(n) 3.867078 3.867706 3.936436 4.020121 4.094621 4.173073
## FPE(n) 43.885536 41.480544 41.970897 43.107576 43.871083 44.825965
## 7 8 9 10
## AIC(n) 3.811918 3.840491 3.860155 3.877568
## HQ(n) 3.984044 4.035568 4.078181 4.118544
## SC(n) 4.239397 4.324968 4.401629 4.476039
## FPE(n) 45.250817 46.568888 47.501981 48.346834
var22<-VAR(yy2,p=2,type="const")
var22.irfs <- irf(var22, n.ahead=40)
par(mfcol=c(2,2), cex=0.6)
plot(var22.irfs, plot.type="single")
var22.fevd <- fevd(var22, n.ahead=40)
var22.fevd[[1]][c(1,4,8,40),]
## y2 y1
## [1,] 1.0000000 0.000000000
## [2,] 0.9935239 0.006476147
## [3,] 0.9934082 0.006591832
## [4,] 0.9934079 0.006592058
var22.fevd[[2]][c(1,4,8,40),]
## y2 y1
## [1,] 0.01847382 0.9815262
## [2,] 0.18059627 0.8194037
## [3,] 0.18207471 0.8179253
## [4,] 0.18207906 0.8179209
plot(var22.fevd)
plot(var22.fevd, addbars=8)
The chart basically says the same thing, except it is reversed. We can say that the S&P predicts the GDP because the S&P is based on more real time economic data than the GDP is. The GDP’s numbers come out about a month after each quarter ends, and the S&P is continually updated with investor priors using company earnings announcements, weekly unemployment data, monthly manufacturing numbers, etc. So by the time the S&P reacts to real time, backward looking, and forward looking estimates, priors are already baked into the S&P numbers. GDP numbers are largely backward looking. This is why we see a consistent relationship between the two variables no matter which order we put the two variables in.