1 Introduction


      Instrumental Variables serve as a valuable tool that allows us to run regressional models with variables that would otherwise be endogenous. IVs do so by employing the process of 2SLS, where an exogenous variable is used to predict the endogenous variable within our initial model. Then, the predicted values, free of endogeneity, are used within the original model.
      To give an example of an IV, I will review a paper written in 2006 by Jeffrey R. Kling titled “Incarceration Length, Employment, and Earnings.” This paper, originally published in the American Economic Review, investigates the impact of the length of incarceration on earnings and job prospects after the prisoner is released. This is an interesting question because the answer could roll either way. It is possible that longer time spent in prison will lead to negative outcomes due to the individual losing touch with the workforce and becoming “institutionalized.” On the flip side, it is possible that prisoners become rehabilitated while incarcerated, using the learning opportunities in prison and other resources to emerge in a better situation than coming in. This paper is therefore investigating whether incarceration leads to rehabilitation or is just a form of punishment.

2 Paper Background


      Firstly, we must address why running a simple OLS model is not a viable way to answer the paper’s research question.

\(log(earnings_{i})\) = \(\beta_{0}\) + \(\beta_{1}\)\(Incarceration Length_{i}\)+\(\delta_{1}\)\(controls\)+\(u_{i}\)


      To explain the impracticality of OLS in this particular example, we must remember the five Gaus-Markov assumptions required for a model to be BLUE. Specifically, we must think of the zero conditional mean assumption. Incarceration Length, while correlated with earnings, is more importantly correlated with the crime the person committed. If a man robs a gas station, his crime will be highly correlated with the incarceration length. Additionally, if a man is robbing a gas station, he likely has few other options to make money. So, the incarceration length variable suffers from some endogeneity stemming from omitted variable bias and a ZCM violation.
      To alleviate the endogeneity issue in the model, Kling opts to use an instrumental variable to eliminate potential bias. When selecting an IV, two conditions must be met: instrument exogeneity and instrument relevance. Instrument exogeneity means that it does not suffer from the same bias issues as the initial variable for which we are using the IV. Mathematically, this can be expressed as \(Cov(z|u)=0\), where z is the instrument we are using. Exogeneity is important because it assures that the instrument is not affecting the dependent variable via some pathway included in the error term. The second is relevance. The instrument is relevant if it impacts our initial endogenous variable. This can be expressed mathematically as \(Cov(z|x)≠0\).
      Kling chose “leniency of a judge,” as an instrument. Leniency, in this case, is classified as the number of years the judge gives for the same crime, whereas judges who, on average, imprisoned people for less time were categorized as more lenient and vice versa.
      Leniency is an exogenous instrument since judges are assigned randomly and, therefore, are not correlated with earnings or anything else in the error term. An argument could be made that judges in low-income areas might become pessimistic and harsher, leading to a correlation between income and judge leniency; however, there is no evidence for this counterfactual. Secondly, the instrument is relevant since there is a clear relationship between judge leniency and incarceration length.

3 Model Derivation


      As I mentioned previously, we cannot use the simple equation
\(log(earnings_{i})\) = \(\beta_{0}\) + \(\beta_{1}\)\(Incarceration Length_{i}\)+\(\delta_{1}\)\(controls\)+\(u_{i}\)

      To appropriately use an IV, we must predict the endogenous variable using the exogenous instrument we have chosen. The estimating equation would look something like this:
\(Incarceration Length_{i}\) = \(\beta_{0}\) + \(\beta_{1}\)\(JudgeLeniency_{i}\)+\(\delta_{1}\)\(controls\)+\(v_{i}\)

      This process is called 2SLS, or 2-step least squares, where we predict the endogenous variable using the exogenous variable and then use the exogenous predicted values within our original model. The final equation would, therefore look like this:
\(log(earnings_{i})\) = \(\beta_{0}\) + \(\beta_{1}\)\(Incarceration LengthHAT_{i}\)+\(\delta_{1}\)\(controls\)


      Where “IncarcerationLengthHAT_{i}” represents the predicted values of Incarceration Length from the 2SLS regression.