Instrumental Variable Quantile Regression

Instrumental variable quantile regression (Chernozhukov and Hansen, 2005) provides the ability to study heterogeneous impacts of variables and serves as a valuable tool in empirical analysis in economics, where endogeneity commonly arises. To facilitate its power and make ease for empirical practitioners, I develop an R package implementing the corresponding estimation procedure and Wald test (Chernozhukov and Hansen, 2006), inference on quantile process with score subsampling (Chernozhukov and Hansen, 2006), and weak IV robust inference (Chernozhukov and Hansen, 2008). Also, I replicate the empirical example in Chernozhukov and Hansen (2004) to demonstrate its usage.

Introduction

In the seminal paper, Chernozhukov and Hansen (2005) propose an instrumental variable extension of quantile regression. The model, in a potential outcome framework, can be expressed as: \[Y := Y_D,\] where \(Y\) and D denotes the outcome variable and the treatment variable(s) respectively. The object of interest, the quantile treatment effect (QTE), is the \(\tau\)-th quantile of potential outcomes under various potential treatment states \(d\), conditional on observed characteristics \(X = x\), denoted as \[QTE = q(\tau,d_1,x) - q(\tau,d_0,x),\] \[q(\tau,d_0,x) = Q_{Y_d}(\tau|x),\]

Let \(Z\) denote the instrumental variable(s). Under the main assumptions in their model, the quantile treatment effects are non-parametrically identified by the following moment conditions: \[ P[Y < q(\tau,D,X)|X,Z] = \tau, \forall \tau \in (0,1) \] However, the moment conditions are non-convex and non-smooth, making the estimation computationally difficult in problems with a practically relevant size.

Several approaches have been developed to overcome this. The most popular method is the inverse quantile method proposed by Chernozukov and Hansen (2006), which is what I implement in the package. It focuses on linear-in-parameter structure quantile models: \[q(\tau,d,x) = d\alpha_0(\tau) + x' \beta_0(\tau)\] The method circumvents the computational difficulty by observing the following restrictions on the conditional quantile function: \[ Q_{Y-D'\alpha_0}(\tau|X,Z) = X'\beta_0 + Z'\gamma_0 \space with \space \gamma_0 = 0 \]

can be derived from the moment condition. As a result, the estimation problem amounts to finding the coefficients that satisfy the restriction, and this can be done by grid searching over the possible values of \(\alpha\). Each evaluation at a point of the grid only involves a quantile regression, which can be done quite efficiently with the interior point-preprocessing methods introduced by Portnoy and Koenker (1997). The method is most useful when the dimension of \(\alpha\) is small, which is usually the case. However, this method is not practical for large dimensions of alpha as the computations needed will grow exponentially.

Coefficient Estimation and Standard Errors

In this section, we discuss how to estimate the QTE function with the package. As it is most easily explained with an example, I will replicate the 401k empirical analysis done in Chernozhukov and Hansen (2004).

First we load the package and data, then we ensure that ‘icat’ and ‘ecat’ are stored as categorical variables.

library(IVQR)
data(CH04)
CH04$icat <- factor(CH04$icat)
CH04$ecat <- factor(CH04$ecat)

Follow the model specification suggested by Chernozhukov and Hansen (2004):

model <- net_tfa ~ p401 | e401 | icat+ecat+a1+a2+a3+a4+marr+fsize+twoearn+db+pira+hown

net_tfa is the outcome variable, which is the net financial assets as a measure of wealth. Variable p401 indicates 401(k) participation status and is instrumented for by e401, 401(k) eligibility. The remaining variables are the controls, consisting of dummies for income category, dummies for age category, dummies for education category, a marital status indicator, family size, two-earner status, DB pension status, IRA participation status, homeownership status, and a constant.

Set up the parameters to feed in the function ivqr(). Explictly, specify the following: -quantile(s) intended for the estimation -grid for the grid search -qrMethod to determine which algorithm to employ in the quantile regression

taus <- 0.5
grid <- seq(0,25000,125)
qrMethod <- 'br'

An easy way to find out the ranges of the grid is to construct one with a two stage quantile regression (2SQR). The function suggest_grid() provides a grid centered at the estimate from 2SQR, with the range determined by the standard error of the estimator. However, it is recommended to use the function diagnostic() to see if the objective function is minimized.

fit_median <- ivqr(formula = iqr_model, taus=taus, data=CH04, grid=grid, qrMethod='br')

Printing the result from ivqr() will give the estimates and standard errors obtained from the kernel estimator.

print(fit_median)

## 
## tau:[1] 0.5
## 
## Coefficients of endogenous variables
##      coef       se
## [1,] 5700 619.8744
## 
## Coefficients of control variables
##               coef        se
##  [1,]  2762.945026  736.1695
##  [2,]   118.660061  518.8966
##  [3,]   248.774828  561.6435
##  [4,]   735.754020  641.3335
##  [5,]  2518.639374  771.9014
##  [6,]  7272.944854  883.4196
##  [7,] 29895.175070 2984.1171
##  [8,]    -6.702907  428.2371
##  [9,]  -361.747646  496.1048
## [10,]  -163.734193  602.4277
## [11,] -2934.258800  642.8791
## [12,] -2917.188055  622.0741
## [13,] -2815.174642  619.3531
## [14,] -2361.705826  655.2678
## [15,]   364.879765  477.1729
## [16,]  -157.675658  117.4195
## [17,] -2399.193333  482.9578
## [18,]     7.933891  393.4496
## [19,] 20442.559240  931.7845
## [20,]   318.071999  333.1839

Inference on the Quantile Process

To estimate the whole quantile process and to further make inferences on it, first specify the parameter taus as a grid from 0 to 1.

taus <- seq(0.02,0.98,0.02)
fit <- ivqr(formula = iqr_model, taus=taus, data=CH04, grid=grid, qrMethod='br')

Instead of printing the fitted model, a more elegant and informative way is to plot the estimates over quantiles for each variable separately. Here, the effect of the 401k program is our main interest, so we plot the corresponding estimates:

plot(fit)

With the estimated process, we can further make inferences on the process. For example, we may test the null hypothesis that the program has no effect, i.e. \(\alpha(\tau)= \forall \tau \in (0,1)\). This can be done by calculating the Kolmogorov-Smirnov test statistics with critical values obtained from score subsampling:

ivqr.ks(fit, trim = c(0.1,0.9), nullH = "No_Effect")

## [1] 12.61846
##      90%      95%      99% 
## 2.564384 2.827400 3.439025 
## [1] "Block size: 198"

We can also test the null hypothesis that the effect is constamt across quantiles. There, we spcify the parameter nullH as “Location_Shift”

ivqr.ks(fit, trim = c(0.05,0.95), nullH = "Location_Shift")

## [1] 8.000855
##      90%      95%      99% 
## 2.726923 2.991167 3.457061 
## [1] "Block size: 198"

Weak IV Robust Inference

Chernozhukov and Hansen (2008) also provides a testing procedure that is robust to weak instruments. The function weakIVtest() calculates the weak IV robust confidence interval, and we visualize with plot():

plot(weakIVtest(fit))

Conclusion

In conclusion, the IVQR is a useful method to study heterogeneity arising in various settings. However, the use of IVQR is not widespread due to the nontrivial programming involved. This package attempts to make IVQR more approachable to applied practitioners.