Part 1: Paper Using Randomized Data: Impact of Class Size on Learning

Article: Krueger (1999) Experimental Estimates of Education Production Functions QJE 114 (2): 497-532

c. What is the identification strategy?

Randomization is used so that a student’s assignment to their classroom size is not correlated to any confounding variable. Doing so, one can claim that endogeneity issue is removed and the estimate is unbiased.

d. What are the assumptions / threats to this identification strategy?

The identification strategy solely rests on the benefits of randomization. The study is as good as the randomization. The threat is that it is not a completely controlled environment in that students can freely enter and leave the treated classes. If, for instance, a new student entering a small class comes from a better school with smaller class size, this would underestimate the impacts of the class size on outcome variables.

Part 2.1: Paper using Twins for Identification: Economic Returns to Schooling

Article: Ashenfelter and Krueger (1994) Estimates of the Economic Return to Schooling from a New Sample of Twins AER 84(5): 1157-1173

c. What is the identification strategy?

The identification strategy used in the paper is to compare the wages between identical twins. Since they are genetically identical and have similar family backgrounds, the difference in the wages could be attributed to the difference in education.

d. What are the assumptions / threats to this identification strategy?

A crucial assumption to this identification strategy, as said above, is that monozygotic twins are genetically identical and have similar family backgrounds. A threat to this strategy is that although they have similar innate biological and family characteristics, their tastes and preferences could be different, leading to different number of years of schooling and thus, different wages.

Part 2.2: Replication Analysis

a. Load Ashenfelter and Krueger AER 1994 data.

b. Reproduce the result from table 3 column 5.

library(haven)
library(stargazer)
data <- read_dta("AshenfelterKrueger1994_twins.dta")

data$diff1=data$educ1-data$educ2
data$diff2=data$lwage1-data$lwage2
reg1<-lm(diff2~diff1,data=data)
stargazer(reg1, type = "text")
## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                diff2           
## -----------------------------------------------
## diff1                        0.092***          
##                               (0.024)          
##                                                
## Constant                      -0.079*          
##                               (0.045)          
##                                                
## -----------------------------------------------
## Observations                    149            
## R2                             0.092           
## Adjusted R2                    0.086           
## Residual Std. Error      0.554 (df = 147)      
## F Statistic           14.914*** (df = 1; 147)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

c. Explain how this coefficient should be interpreted.

We can interpret the coefficient as: one more year of education leads to an increase in wage by 9.2 percent on average.

d. Reproduce the result in table 3 column 1.

l <- reshape(data, 
             varying=c("educ1", "lwage1", "male1","white1", "educ2","lwage2","male2","white2"),
             v.names=c("educ","lwage","male","white"),
             timevar = "twin", 
             times = c("T1", "T2"), 
             idvar=c("famid","age"),
             direction = "l")

l.sort <- l[order(l$famid),]

#Help for -reshape- command taken from https://stats.oarc.ucla.edu/r/faq/how-can-i-reshape-my-data-in-r/

l$age2<-l$age^2/100
reg2<-lm(lwage~educ+age+age2+male+white, data=l)
stargazer(reg2, type = "text")
## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                lwage           
## -----------------------------------------------
## educ                         0.084***          
##                               (0.014)          
##                                                
## age                          0.088***          
##                               (0.019)          
##                                                
## age2                         -0.087***         
##                               (0.023)          
##                                                
## male                         0.204***          
##                               (0.063)          
##                                                
## white                        -0.410***         
##                               (0.127)          
##                                                
## Constant                      -0.471           
##                               (0.426)          
##                                                
## -----------------------------------------------
## Observations                    298            
## R2                             0.272           
## Adjusted R2                    0.260           
## Residual Std. Error      0.532 (df = 292)      
## F Statistic           21.860*** (df = 5; 292)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

e. Explain how the coefficient on education should be interpreted.

We can interpret the coefficient on education as: One more year of school will lead to an 8.4 percent increase in wage on average.

f. Explain how the coefficient on the control variables should be interpreted.

Wages increase but then drops after a certain point. Male individuals earn 20.4 percent more than female individuals on average. White individuals earn 41 percent lower than non-white individuals.