The Frisch-Waugh-Lovell (FWL) Theorem is a simple yet powerful theorem that allows us to reduce multivariate regressions to univariate ones. Simulate data in R and apply the FWL theorem to it in an IV-setting.
# set seed for exact replication
set.seed(10101)
# model parameters:
n = 1000 # sample size
m = 5 # number of IVs
rho = 0.8 # correlation b/w the error terms causing endogeneity problem
c = 0.15 # coefficient values of the instruments
# DGP:
z <- matrix(rnorm(n*m,mean=0,sd=1), n, m) # (n x m) matrix of random-normal draws
sigma_v = 1 # define SD of error term v
v = rnorm(mean = 0,sd = sigma_v,n) # array of length n with random-normal draws
# define error term u in terms of the correlation btw. u and v using
# bivariate Cholesky decomposition for given rho
u = rho * v + sqrt(1-rho**2) * rnorm(mean = 0,sd = sigma_v,n)
pi = rep(1, m) * c # define the first stage coefficient vector of the instruments
x = z %*% pi + v # generate x data
y = 1 + 2 * x + u # generate y data
# Estimate the model and same the results in object "ols"
ols_data <- data.frame(cbind(x,y))
colnames(ols_data) <- c("x", "y")
beta_OLS <- lm(y ~ x, data = ols_data)
# Print the result in the console
summary(beta_OLS)
##
## Call:
## lm(formula = y ~ x, data = ols_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.14127 -0.47056 -0.00055 0.45572 2.15043
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.01646 0.02096 48.48 <2e-16 ***
## x 2.67123 0.02013 132.71 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6629 on 998 degrees of freedom
## Multiple R-squared: 0.9464, Adjusted R-squared: 0.9463
## F-statistic: 1.761e+04 on 1 and 998 DF, p-value: < 2.2e-16
Note that, given:
\[ y_i = \alpha + \beta x_i + u_i \] Being \(\rho=0.8>0\), then:
\[ E[x_i, u_i]>0 \]
Which implies that the \(\hat{\beta}_{OLS}\) is overestimating the true \(\beta\):
\[ \hat{\beta}_{OLS}=\frac{Cov(x,y)}{Var(x)}=\frac{Cov(x,\alpha+\beta x +u)}{Var(x)}=\beta\frac{Var(x)}{Var(x)}+\frac{Cov(x,u)}{Var(x)}=\beta+\underbrace{\frac{Cov(x,u)}{Var(x)}}_{>0}>\beta \]
Considering the projection matrix \(P_Z\) as:
\[ P_z = Z(Z'Z)^{-1}Z' \]
where \(Z\) is a \(n\times m\) matrix, therefore:
#install.packages("matlib")
library(matlib)
## Warning: il pacchetto 'matlib' è stato creato con R versione 4.3.3
pz <- z%*%(inv(t(z)%*%z))%*%t(z)
Given the properties of the projection matrix, the 2SLS estimator is defined as:
\[ \beta_{2SLS}=(X'Z(Z'Z)^{-1}Z'X)^{-1}X'Z(Z'Z)^{-1}Z'y \]
A <- t(x)%*%pz
B <- A%*%x
B_inv <- B^(-1)
C <- t(x)%*%pz%*%y
beta_2SLS = B_inv*C
print(beta_2SLS)
## [,1]
## [1,] 1.843433
Therefore, with the 2SLS we are closer to the true parameter.
ols_data_2 <- data.frame(cbind(x,z))
colnames(ols_data_2) <- c("x", "z")
pi_IV <- lm(x ~ z, data = ols_data_2)
# Print the result in the console
summary(pi_IV)
##
## Call:
## lm(formula = x ~ z, data = ols_data_2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7295 -0.6951 0.0542 0.6949 3.3689
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.005417 0.032496 -0.167 0.868
## z 0.177767 0.032121 5.534 3.99e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.027 on 998 degrees of freedom
## Multiple R-squared: 0.02978, Adjusted R-squared: 0.0288
## F-statistic: 30.63 on 1 and 998 DF, p-value: 3.993e-08
Now we construct the fitted values:
x_hat = -0.01140+0.17461*z[,1]+0.13820*z[,2]+0.13268*z[,3]+0.11829*z[,4]+0.14288*z[,5]
ols_data_3 <- data.frame(cbind(x_hat,y))
colnames(ols_data_3) <- c("x_hat", "y")
beta_IV <- lm(y ~ x_hat, data = ols_data_3)
# Print the result in the console
summary(beta_IV)
##
## Call:
## lm(formula = y ~ x_hat, data = ols_data_3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.5274 -1.8815 0.0739 1.8057 10.0676
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.00645 0.08855 11.366 < 2e-16 ***
## x_hat 1.85110 0.27135 6.822 1.55e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.798 on 998 degrees of freedom
## Multiple R-squared: 0.04455, Adjusted R-squared: 0.0436
## F-statistic: 46.54 on 1 and 998 DF, p-value: 1.555e-11
The IV coefficient using the “ivreg” package in R is 1.8511 that is the same as in the previous method.
library(ivreg)
## Warning: il pacchetto 'ivreg' è stato creato con R versione 4.3.3
iv_data <- data.frame(cbind(x,y,z))
colnames(iv_data) <- c("x", "y", "z1", "z2", "z3", "z4", "z5")
beta_IV_2 <- ivreg(y ~ x | z1 + z2 + z3 + z4 + z5, data = iv_data)
summary(beta_IV_2)
##
## Call:
## ivreg(formula = y ~ x | z1 + z2 + z3 + z4 + z5, data = iv_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.79408 -0.72668 0.04136 0.71690 4.06361
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.00645 0.03424 29.40 <2e-16 ***
## x 1.85110 0.10491 17.64 <2e-16 ***
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 5 994 21.608 <2e-16 ***
## Wu-Hausman 1 997 220.065 <2e-16 ***
## Sargan 4 NA 1.924 0.75
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.082 on 998 degrees of freedom
## Multiple R-Squared: 0.8572, Adjusted R-squared: 0.857
## Wald test: 311.3 on 1 and 998 DF, p-value: < 2.2e-16