Exercise 1

The Frisch-Waugh-Lovell (FWL) Theorem is a simple yet powerful theorem that allows us to reduce multivariate regressions to univariate ones. Simulate data in R and apply the FWL theorem to it in an IV-setting.

# set seed for exact replication
set.seed(10101)
# model parameters:
n = 1000 # sample size
m = 5 # number of IVs
rho = 0.8 # correlation b/w the error terms causing endogeneity problem
c = 0.15 # coefficient values of the instruments
# DGP:
z <- matrix(rnorm(n*m,mean=0,sd=1), n, m) # (n x m) matrix of random-normal draws
sigma_v = 1 # define SD of error term v
v = rnorm(mean = 0,sd = sigma_v,n) # array of length n with random-normal draws
# define error term u in terms of the correlation btw. u and v using
# bivariate Cholesky decomposition for given rho
u = rho * v + sqrt(1-rho**2) * rnorm(mean = 0,sd = sigma_v,n)
pi = rep(1, m) * c # define the first stage coefficient vector of the instruments
x = z %*% pi + v # generate x data
y = 1 + 2 * x + u # generate y data

Ex. 1.a

# Estimate the model and same the results in object "ols"
ols_data <- data.frame(cbind(x,y))
colnames(ols_data) <- c("x", "y")
beta_OLS <- lm(y ~ x, data = ols_data)
# Print the result in the console
summary(beta_OLS)
## 
## Call:
## lm(formula = y ~ x, data = ols_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.14127 -0.47056 -0.00055  0.45572  2.15043 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.01646    0.02096   48.48   <2e-16 ***
## x            2.67123    0.02013  132.71   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6629 on 998 degrees of freedom
## Multiple R-squared:  0.9464, Adjusted R-squared:  0.9463 
## F-statistic: 1.761e+04 on 1 and 998 DF,  p-value: < 2.2e-16

Note that, given:

\[ y_i = \alpha + \beta x_i + u_i \] Being \(\rho=0.8>0\), then:

\[ E[x_i, u_i]>0 \]

Which implies that the \(\hat{\beta}_{OLS}\) is overestimating the true \(\beta\):

\[ \hat{\beta}_{OLS}=\frac{Cov(x,y)}{Var(x)}=\frac{Cov(x,\alpha+\beta x +u)}{Var(x)}=\beta\frac{Var(x)}{Var(x)}+\frac{Cov(x,u)}{Var(x)}=\beta+\underbrace{\frac{Cov(x,u)}{Var(x)}}_{>0}>\beta \]

Ex. 1.b

Considering the projection matrix \(P_Z\) as:

\[ P_z = Z(Z'Z)^{-1}Z' \]

where \(Z\) is a \(n\times m\) matrix, therefore:

#install.packages("matlib")
library(matlib)
## Warning: il pacchetto 'matlib' è stato creato con R versione 4.3.3
pz <- z%*%(inv(t(z)%*%z))%*%t(z)

Given the properties of the projection matrix, the 2SLS estimator is defined as:

\[ \beta_{2SLS}=(X'Z(Z'Z)^{-1}Z'X)^{-1}X'Z(Z'Z)^{-1}Z'y \]

A <- t(x)%*%pz
B <- A%*%x
B_inv <- B^(-1)

C <- t(x)%*%pz%*%y
beta_2SLS = B_inv*C
print(beta_2SLS)
##          [,1]
## [1,] 1.843433

Therefore, with the 2SLS we are closer to the true parameter.

Ex. 1.c

ols_data_2 <- data.frame(cbind(x,z))
colnames(ols_data_2) <- c("x", "z")
pi_IV <- lm(x ~ z, data = ols_data_2)
# Print the result in the console
summary(pi_IV)
## 
## Call:
## lm(formula = x ~ z, data = ols_data_2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7295 -0.6951  0.0542  0.6949  3.3689 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.005417   0.032496  -0.167    0.868    
## z            0.177767   0.032121   5.534 3.99e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.027 on 998 degrees of freedom
## Multiple R-squared:  0.02978,    Adjusted R-squared:  0.0288 
## F-statistic: 30.63 on 1 and 998 DF,  p-value: 3.993e-08

Now we construct the fitted values:

x_hat = -0.01140+0.17461*z[,1]+0.13820*z[,2]+0.13268*z[,3]+0.11829*z[,4]+0.14288*z[,5]
ols_data_3 <- data.frame(cbind(x_hat,y))
colnames(ols_data_3) <- c("x_hat", "y")
beta_IV <- lm(y ~ x_hat, data = ols_data_3)
# Print the result in the console
summary(beta_IV)
## 
## Call:
## lm(formula = y ~ x_hat, data = ols_data_3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.5274 -1.8815  0.0739  1.8057 10.0676 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.00645    0.08855  11.366  < 2e-16 ***
## x_hat        1.85110    0.27135   6.822 1.55e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.798 on 998 degrees of freedom
## Multiple R-squared:  0.04455,    Adjusted R-squared:  0.0436 
## F-statistic: 46.54 on 1 and 998 DF,  p-value: 1.555e-11

Ex. 1.d

The IV coefficient using the “ivreg” package in R is 1.8511 that is the same as in the previous method.

library(ivreg)
## Warning: il pacchetto 'ivreg' è stato creato con R versione 4.3.3
iv_data <- data.frame(cbind(x,y,z))
colnames(iv_data) <- c("x", "y", "z1", "z2", "z3", "z4", "z5")

beta_IV_2 <- ivreg(y ~ x | z1 + z2 + z3 + z4 + z5, data = iv_data)
summary(beta_IV_2)
## 
## Call:
## ivreg(formula = y ~ x | z1 + z2 + z3 + z4 + z5, data = iv_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3.79408 -0.72668  0.04136  0.71690  4.06361 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.00645    0.03424   29.40   <2e-16 ***
## x            1.85110    0.10491   17.64   <2e-16 ***
## 
## Diagnostic tests:
##                  df1 df2 statistic p-value    
## Weak instruments   5 994    21.608  <2e-16 ***
## Wu-Hausman         1 997   220.065  <2e-16 ***
## Sargan             4  NA     1.924    0.75    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.082 on 998 degrees of freedom
## Multiple R-Squared: 0.8572,  Adjusted R-squared: 0.857 
## Wald test: 311.3 on 1 and 998 DF,  p-value: < 2.2e-16