Based on Angrist et al (2012)

A Brief Summary

Main Questions:

How do school voucher programs affect the likelihood that a student finishes 8th grade?

Do voucher lottery winners have enrollment differences in school?

Do lottery winners have different results on achievement tests?

Do lottery winners observe differences in work and marriage patterns?

Is there any evidence that these school voucher programs are effective to create better educational experiences for some students?

Main Findings:

Lottery winners were about 10 percentage points more likely to have finished 8th grade, primarily because they were less likely to repeat grades, and scored 0.2 standard deviations higher on achievement tests.

No significant differences between lottery winners and losers in enrollment three years after application, with most pupils in both the winner and loser groups still in school.

Lottery winners were 15 percentage points more likely to attend private schools rather than public schools.

Lottery winners had completed an additional 0.1 years of school and were about 10 percentage points more likely than losers to have completed eighth grade.

Initial Regression & Evaluating Exogeneity

usesch: The coefficient of 0.13673 indicates that a student who used a private school scholarship/voucher were on average 13.673 percentage points more likely to have finished 8th grade. (Note: This variable is significant at 5% level of significance)

The variable usesch is likely endogenous in the sample regression equation because of it is likely that there are omitted variables correlated with usesch. Whether or not a student is currently in school may be correlated with whether or not they are using a scholarship.

## 
## Call:
## lm(formula = finish8 ~ usesch, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.7419 -0.6052  0.2581  0.2581  0.3948 
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)  0.60521    0.02058  29.407 < 0.0000000000000002 ***
## usesch       0.13673    0.02683   5.095          0.000000403 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4597 on 1210 degrees of freedom
## Multiple R-squared:  0.02101,    Adjusted R-squared:  0.0202 
## F-statistic: 25.96 on 1 and 1210 DF,  p-value: 0.0000004033

Evaluating Potential Instruments

Looking at the regressions below where we regress our potential instruments on usesch, we can see that only lottery and age are relevant at the 5% level of significance.

id id is likely to be neither relevant nor exogenous. A random id will likely have no effect at all on whether or not a student uses a scholarship, so therefore it is likely not exogenous.

lottery lottery is likely to be relevant but not exogenous. A persons winning the lottery to receive a scholarship voucher is most likely to be linked to whether or not that student is using a scholarship voucher. It is also likely that a student that has won a lottery will have no direct connection to whether or not that student has finished 8th grade. This variable is therefore likely to be exogenous.

age age is likely to be relevant but not exogenous. A persons age may be linked to what kinds of scholarships they are able to receive and use, but a persons age may have an impact on their ability to finish 8th grade. This variable is therefore unlikely to be exogenous.

Based on the relevancy tests performed below and the arguments for exogeneity above, the variable most likely to be a valid instrument for usesch is lottery because it is both relevant and exogenous.

## 
## Call:
## lm(formula = usesch ~ id, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.6081 -0.5835  0.3972  0.4144  0.4318 
## 
## Coefficients:
##                Estimate  Std. Error t value            Pr(>|t|)    
## (Intercept)  0.60843749  0.02839143  21.430 <0.0000000000000002 ***
## id          -0.00004032  0.00004925  -0.819               0.413    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4924 on 1210 degrees of freedom
## Multiple R-squared:  0.0005536,  Adjusted R-squared:  -0.0002724 
## F-statistic: 0.6702 on 1 and 1210 DF,  p-value: 0.4131
## 
## Call:
## lm(formula = usesch ~ lottery, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.91097 -0.24014  0.08903  0.08903  0.75986 
## 
## Coefficients:
##             Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)  0.24014    0.01494   16.08 <0.0000000000000002 ***
## lottery      0.67083    0.02073   32.35 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3607 on 1210 degrees of freedom
## Multiple R-squared:  0.4638, Adjusted R-squared:  0.4634 
## F-statistic:  1047 on 1 and 1210 DF,  p-value: < 0.00000000000000022
## 
## Call:
## lm(formula = usesch ~ age, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.7045 -0.5883  0.3885  0.4117  0.5047 
## 
## Coefficients:
##             Estimate Std. Error t value      Pr(>|t|)    
## (Intercept)  0.93696    0.15805   5.928 0.00000000399 ***
## age         -0.02325    0.01049  -2.215        0.0269 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4916 on 1210 degrees of freedom
## Multiple R-squared:  0.004038,   Adjusted R-squared:  0.003215 
## F-statistic: 4.906 on 1 and 1210 DF,  p-value: 0.02694

Implement 2SLS

Steps to implement 2SLS

  1. Estimate prison usesch = a0 + a1lottery + e

  2. Obtain fitted values of usesch (usesch hat)

  3. Use the fitted values to estimate finish8 = bo + b1(usesch hat) + e

Comparing coefficient to original regression The new coefficient on usesch is statistically different from that obatained in the orignal regression (both pictured below). This indicates that the usesch in the original regression likely was endogenous.

## 
## Call:
## lm(formula = finish8 ~ usesch_hat, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.7393 -0.6278  0.2607  0.3722  0.3722 
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)  0.58788    0.02677  21.964 < 0.0000000000000002 ***
## usesch_hat   0.16618    0.03953   4.204            0.0000282 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4613 on 1210 degrees of freedom
## Multiple R-squared:  0.01439,    Adjusted R-squared:  0.01358 
## F-statistic: 17.67 on 1 and 1210 DF,  p-value: 0.00002818
## 
## Call:
## lm(formula = finish8 ~ usesch, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.7419 -0.6052  0.2581  0.2581  0.3948 
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)  0.60521    0.02058  29.407 < 0.0000000000000002 ***
## usesch       0.13673    0.02683   5.095          0.000000403 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4597 on 1210 degrees of freedom
## Multiple R-squared:  0.02101,    Adjusted R-squared:  0.0202 
## F-statistic: 25.96 on 1 and 1210 DF,  p-value: 0.0000004033

Using ivreg Command

Lottery may not meet the exogeneity requirement for an instrument unless we add the explanatory variables strata1 − strata5, svy, and phone to our first- and second-stage regressions. The results using these constraints appear below.

Additionally to get a sense of what happens when we use a known bad instrument, we use id to observe how the coefficient on usesch changes. We observe below that the use of a bad instrument gives us results that don’t really make sense. A student that indicates that a student who used a private school scholarship/voucher were on average 92.94 percentage points more likely to have finished 8th grade. This also results in no dependent variables being significant at the 5% level of significance. This is much larger than the coefficient observed using lottery as an instrument previously.

## 
## Call:
## ivreg(formula = finish8 ~ usesch + strata1 + strata2 + strata3 + 
##     strata4 + strata5 + svy + phone | lottery + strata1 + strata2 + 
##     strata3 + strata4 + strata5 + svy + phone, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8298 -0.5913  0.2434  0.3110  0.4716 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.803015   0.210422   3.816 0.000142 ***
## usesch       0.160653   0.039330   4.085 0.000047 ***
## strata1      0.017497   0.048243   0.363 0.716899    
## strata2      0.062902   0.037906   1.659 0.097296 .  
## strata3      0.136109   0.049504   2.749 0.006058 ** 
## strata4      0.333124   0.190656   1.747 0.080849 .  
## strata5      0.386675   0.326059   1.186 0.235893    
## svy         -0.004638   0.030556  -0.152 0.879392    
## phone       -0.270017   0.205891  -1.311 0.189955    
## 
## Diagnostic tests:
##                   df1  df2 statistic             p-value    
## Weak instruments    1 1203  1045.835 <0.0000000000000002 ***
## Wu-Hausman          1 1202     1.076                 0.3    
## Sargan              0   NA        NA                  NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4585 on 1203 degrees of freedom
## Multiple R-Squared: 0.03175, Adjusted R-squared: 0.02531 
## Wald test: 4.197 on 8 and 1203 DF,  p-value: 0.00005667
## 
## Call:
## ivreg(formula = finish8 ~ usesch + strata1 + strata2 + strata3 + 
##     strata4 + strata5 + svy + phone | id + strata1 + strata2 + 
##     strata3 + strata4 + strata5 + svy + phone, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.12451 -0.13048 -0.06662 -0.03282  0.89663 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)  0.220032   1.025164   0.215    0.830
## usesch       0.929452   1.302571   0.714    0.476
## strata1      0.015699   0.063637   0.247    0.805
## strata2      0.027113   0.078520   0.345    0.730
## strata3      0.084997   0.108360   0.784    0.433
## strata4      0.115399   0.446057   0.259    0.796
## strata5      0.425216   0.434530   0.979    0.328
## svy         -0.006689   0.040409  -0.166    0.869
## phone       -0.109974   0.383408  -0.287    0.774
## 
## Diagnostic tests:
##                   df1  df2 statistic p-value
## Weak instruments    1 1203     0.886   0.347
## Wu-Hausman          1 1202     0.654   0.419
## Sargan              0   NA        NA      NA
## 
## Residual standard error: 0.6041 on 1203 degrees of freedom
## Multiple R-Squared: -0.6809, Adjusted R-squared: -0.692 
## Wald test:  1.28 on 8 and 1203 DF,  p-value: 0.2498