Homework #4

Part 1: Paper Using Randomized Data: Impact of Class Size on Learning

Article: Krueger (1999) Experimental Estimates of Education Production Functions QJE 114 (2): 497-532

a. What is the causal link the paper is trying to reveal?

The paper is trying to estimate the impacts of classroom resources (specifically classroom size) on education outcomes (test scores).

b. What would be the ideal experiment to test this causal link?

Ideal experiment to test this would be to conduct a randomized control trial experiment where a large enough sample of randomly selected students are put to classrooms of varying student sizes. Calculating the average of the means of the wage differences between two adjacent groups would give us the causal estimate.

c. What is the identification strategy?

Randomization is used so that a student’s assignment to their classroom size is not correlated to any confounding variable. Doing so, one can claim that endogeneity issue is removed and the estimate is unbiased.

d. What are the assumptions / threats to this identification strategy?

The identification strategy solely rests on the benefits of randomization. The study is as good as the randomization. The threat is that it is not a completely controlled environment in that students can freely enter and leave the treated classes. If, for instance, a new student entering a small class comes from a better school with smaller class size, this would underestimate the impacts of the class size on outcome variables.

Part 2.1: Paper using Twins for Identification: Economic Returns to Schooling

Article: Ashenfelter and Krueger (1994) Estimates of the Economic Return to Schooling from a New Sample of Twins AER 84(5): 1157-1173

a. What is the causal link the paper is trying to reveal?

The paper is trying to estimate the wage impacts of going to school for one more year.

b. What would be the ideal experiment to test this causal link?

Ideal experiment to test this would be to conduct a randomized control trial experiment where a large enough sample of randomly selected individuals are sent to school for varying number of years. Calculating the average of the means of the wage differences between two adjacent groups (by number of school) would give us the causal estimate.

c. What is the identification strategy?

The identification strategy used in the paper is to compare the wages between identical twins. Since they are genetically identical and have similar family backgrounds, the difference in the wages could be attributed to the difference in education.

d. What are the assumptions / threats to this identification strategy?

A crucial assumption to this identification strategy, as said above, is that monozygotic twins are genetically identical and have similar family backgrounds. A threat to this strategy is that although they have similar innate biological and family characteristics, their tastes and preferences could be different, leading to different number of years of schooling and thus, different wages.

Part 2.2: Replication Analysis

a. Load Ashenfelter and Krueger AER 1994 data.

b. Reproduce the result from table 3 column 5.

library(haven)
library(stargazer)

data <- read_dta("AshenfelterKrueger1994_twins.dta")

data$diff1=data$educ1-data$educ2
data$diff2=data$lwage1-data$lwage2
reg1<-lm(diff2~diff1,data=data)
stargazer(reg1, type = "text")

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                diff2           
## -----------------------------------------------
## diff1                        0.092***          
##                               (0.024)          
##                                                
## Constant                      -0.079*          
##                               (0.045)          
##                                                
## -----------------------------------------------
## Observations                    149            
## R2                             0.092           
## Adjusted R2                    0.086           
## Residual Std. Error      0.554 (df = 147)      
## F Statistic           14.914*** (df = 1; 147)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

c. Explain how this coefficient should be interpreted.

We can interpret the coefficient as: one more year of education leads to an increase in wage by 9.2 percent on average.

d. Reproduce the result in table 3 column 1.

l <- reshape(data, 
             varying=c("educ1", "lwage1", "male1","white1", "educ2","lwage2","male2","white2"),
             v.names=c("educ","lwage","male","white"),
             timevar = "twin", 
             times = c("T1", "T2"), 
             idvar=c("famid","age"),
             direction = "l")

l.sort <- l[order(l$famid),]

#Help for -reshape- command taken from https://stats.oarc.ucla.edu/r/faq/how-can-i-reshape-my-data-in-r/

l$age2<-l$age^2/100
reg2<-lm(lwage~educ+age+age2+male+white, data=l)
stargazer(reg2, type = "text")

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                lwage           
## -----------------------------------------------
## educ                         0.084***          
##                               (0.014)          
##                                                
## age                          0.088***          
##                               (0.019)          
##                                                
## age2                         -0.087***         
##                               (0.023)          
##                                                
## male                         0.204***          
##                               (0.063)          
##                                                
## white                        -0.410***         
##                               (0.127)          
##                                                
## Constant                      -0.471           
##                               (0.426)          
##                                                
## -----------------------------------------------
## Observations                    298            
## R2                             0.272           
## Adjusted R2                    0.260           
## Residual Std. Error      0.532 (df = 292)      
## F Statistic           21.860*** (df = 5; 292)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

e. Explain how the coefficient on education should be interpreted.

We can interpret the coefficient on education as: One more year of school will lead to an 8.4 percent increase in wage on average.

f. Explain how the coefficient on the control variables should be interpreted.

Wages increase but then drops after a certain point. Male individuals earn 20.4 percent more than female individuals on average. White individuals earn 41 percent lower than non-white individuals.

Homework #4

Samyam Shrestha

Feb 6, 2022

Part 1: Paper Using Randomized Data: Impact of Class Size on Learning

a. What is the causal link the paper is trying to reveal?

b. What would be the ideal experiment to test this causal link?

c. What is the identification strategy?

d. What are the assumptions / threats to this identification strategy?

Part 2.1: Paper using Twins for Identification: Economic Returns to Schooling

a. What is the causal link the paper is trying to reveal?

b. What would be the ideal experiment to test this causal link?

c. What is the identification strategy?

d. What are the assumptions / threats to this identification strategy?

Part 2.2: Replication Analysis

a. Load Ashenfelter and Krueger AER 1994 data.

b. Reproduce the result from table 3 column 5.

c. Explain how this coefficient should be interpreted.

d. Reproduce the result in table 3 column 1.

e. Explain how the coefficient on education should be interpreted.

f. Explain how the coefficient on the control variables should be interpreted.